Summary of Mathematical Doctoral Thesis: Some improvements of string contraint solving in automated test cases generation for symbolic execution

MINISTRY OF EDUCATION AND

VIETNAM ACADEMY

TRAINING

OF SCIENCE AND TECHNOLOGY

GRADUATE UNIVERSITY SCIENCE AND TECHNOLOGY

----------------------------

To Huu Nguyen

SOME IMPROVEMENTS OF STRING CONTRAINT SOLVING IN AUTOMATED TEST CASES GENERATION FOR SYMBOLIC EXECUTION

Major: Mathematical foundations for Informatics

Code: 9 46 01 10

SUMMARY OF DOCTORAL THESIS

HaNoi - 2020

This thesis is completed at: Graduate University of Science and Technology, Vietnam Academy

of Science and Technology

Supervisors 1: Dr. NGUYEN TRUONG THANG Supervisors 2: Assoc.Prof.Dr. DANG VAN DUC

Reviewer 1: ……………………………………………………………………… ……………………………………………………………………… Reviewer 2: ……………………………………………………………………… ……………………………………………………………………… Reviewer 3: ……………………………………………………………………… ……………………………………………………………………… The thesis is to be presented to the Defense Committee of the Graduate University of Science and Technology – Vietnam Academy of Science and Technology At Date Month Year 2020

The thesis can be found at:

- Vietnam National Library

- Library of Graduate University of Science and Technology

1

INTRODUCTION

In general, for each technique used in dynamic run-time testing, any software can be separated into two stages: preparing the test cases for software testing and performing the program on a background that supports to testing frameworks for available test cases.

One of important activities in order to reduce testing cost is to generate test cases automatically and completely. The software developing organizations often pay a large amount for activities related to software testing. The effectiveness of verification and validation progress depends on the number of errors that has been found and fixed before transferring the software. It means that the quality of a software depends strictly on the quality of generated test cases.

In past years, many researches from international scientists focused to “automatic test case generating” [3, 4] in order to reduce the cost of software developing. There are two main approaches in this topic including testing based on code and testing based on model. Testing based on code has a high coverage. This approach can remove unnecessary instructions that may cause the errors. This approach is also optimizing the computations of testing analysis softwares. Thus, many recent researches concentrate to testing based on code [5, 6, 7].

There are also many researches in automatic test case generating such as test case generating based on specification, test case generating based on model, test case generating by path approach and smart technique. However, testing based on symbolic execution is an attractive research strategy. The basic test case generating techniques were proposed including test case generating by theorem proving [8, 9], test case generating by symbolic execution [5, 7,10,11,12], test case generating by model checking [13], test case generating by an event-flow model [14], test case generating using a Markov chains model [15]. In which, symbolic execution gets the attention of many researchers. Symbolic execution is used to develop and construct many applications [5, 6, 7] on various tools and programming languages such as C/C++, JavaScrip, .NET, java, HTML, etc.

In this thesis, some improvements in string constraint solvers applied in test case generation are studied.

The experiments and analysis are evaluated using Java, a strong, modern programming language.

The research objective:

- To study the methods used to modeling constraints and improve constraint solving capability. - To apply symbolic execution technique in automatic test case generating. - To implement proposed methods in automatic test case generating on string constraints and mixed

constraints.

- To analyze and evaluate the obtained results.

Object and scope of the research:

This thesis aims to research on overview of automatic test case generating, symbolic execution applied into automatic test case generating. The modeling techniques based on Automata and Bitvector are also studied in this thesis. Apart from that, the analysis and evaluation of available test case generating methods on different constraint are mentioned. The quality and effectiveness of test cases generated by using symbolic execution are assessed. Content of the research: - The automatic test case generating techniques, the contents related to constraint modeling on different

types of constraints. Improvements of constraint modeling and constraint solving on string constraints.

- - Applying these improvements in symbolic execution to generate test cases automatically on string

constraints for testing programs.

Research methodology:

2

- Study, analyze, synthesize the documents related to symbolic execution, the role of constraint solving and string constraint solvers in symbolic execution to generate test cases automatically (papers from journals, conferences, etc). Improve the constraint modeling methods and constraint solving capability on string constraints and mixed constraints.

- Analyze and evaluate the published research results.

The main contributions:

Improved the capability of constraint solving in symbolic execution. Implemented constraint modeling techniques and constraint sovling based on automata and bitvector.

- Constructed a constraint modeling on string and mixed constraints. - - - Compared the obtained results with available researches.

The structure of the thesis:

The structure of this thesis is organized as below Chapter 1 presents overview of software tesing and symbolic execution techniques in automatic test case generation. The basic theory related to contents of thesis is also given in this chapter. Based on mentioned theory, the research problem and research approaches are stated.

Chapter 2 shows the results of researches on constraint modeling, constraint solving in symbolic execution. The applications of these tools in typical situations and the evaluation of these methods on typical data types.

Chapter 3 introduces the results of researches of improvements in constraint modeling, the constraint solving in symbolic execution on string constraints and mixed constraints. An extension of symbolic execution technique and the string constraint solving based on automata are also presented in this chapter.

The conclusions of this thesis address the main contributions, the future works and the list of references are

presented in the thesis.

3

CHAPTER 1. OVERVIEW OF SOFTWARE TESTING

AND SYMBOLIC EXECUTION

This chapter presents the overview of fundamental aspects of software testing, symbolic execution and applications of symbolic execution in generating test cases. The contents of this chapter mainly focus on basic concepts and analysis of symbolic execution. Moreover, some challenges and future strategies are also presented in this chapter.

1.1. Software testing

Ensuring a software system or a component of a software system work as expected is a big challenge in software industry. The mistake softwares cause the loss of economy and other serious effects on the fields that the softwares are used.

Figure 1.8. Relation between software development and software testing

1.1.2 Testing techniques [24]: - Black box testing - White box testing - Unit test

1.2. Control flow white box testing technique

Control flow white box testing is divided into static and dynamic testing. The input of these two strategies are the source code of program and testing coverage criteria. After analyzing progress, the output is a set of test cases. Each strategy has the advantages and disadvantages. The objective of control flow white box testing technique is to find the minimum set of test cases that gets the highest coverage.

4 c

sequentially if…else do….while while….do for

Figure 1.6. Common control structures

1.3. The comparison of static and dynamic control flow white box testing technique

The comparison of these static and dynamic control flow white box testing technique in term of the advantages and disadvantages is presented in the thesis. The number of test cases, the run time needed in test case generating, the loop test capability and the effects of the complexity in source code are used in comparison. 1.4. The challenges in software testing

There are some the wrong ideas in software testing such as a good test case is always the high complexity; automatic testing can replace software engineers in order to test the software well; software testing is a simple and easy work; everyone can test a software without any training.

Software testing meets other limitations. The correction of software specification, the accuracy of tesing system are not guaranteed. There is no testing tool that is suitable for every software. Software engineers can not sure about the understand of software product. The resource for complete testing is not enough. We can not perform the testing activities completely.

1.5. Symbolic execution 1.5.1. Overview of symbolic execution

In computer science, symbolic execution is a tool used in analysing a program in order to define the input of each component in that component. A compiler of a program get a symbolic value as input instead of a crispt input as usual.

def f(x,y); if( x>y); x=x+y y=x-y x=x-y if (x-y>0); assert false return (x, y)

Figure 1.7. Example of a concrete idea of a symbolic execution

5 1.5.2. Static symbolic execution

The main idea of symbolic execution [31] is to execute the program with symbolic values instead of

typical vales of input parameters.

Symbolic values of variable x can be represented by: (a). An input notation. (b). A combined expression among symbolic values using operators. (c). A combined expression between symbolic values and typical values using operators.

Figure 1.8. Symbolic execution tree

1.5.3. Dynamic symbolic execution

Dynamic symbolic execution [32, 33] is symbolic execution technique based on dynamic program analysis. Dynamic symbolic execution is the combination of typical execution and symbolic execution. In dynamic symbolic execution, the program is executed many times with different values of input parameters.

Table 1.1. An example of a dynamic symbolicexecution.

Constraints are remembered Contrains C’(i) Input i C(i)

null a = = null

a!=null {} a!=null && !(a.length>0)

a!=null && a.length>0 a!=null&& a.length>0 {0} && a[0]!=123456789

a!=null && a.length>0 && a!=null && a.length>0 {123456789} && a[0]!=123456789 a[0]!=123456789

1.5.4 Concolic execution Concolic execution is a typical performance model. Concolic execution is a modified version of symbolic

execution. DART [36] and CUTE [35] are two tools of concolic execution. 1.6. The conclusion of chapter 1

Chapter 1 of the thesis presents the fundamental concepts of software testing techniques, basic techniques of automatic test case generating from the models. Apart from that, the comparison of static white box testing with dynamic white box testing is also stated. Moreover, the challenges of software testing is given

6 in this chapter. The basic aspects, overview of symbolic execution methods and unresolved problems are presented.

CHAPTER 2. SYMBOLIC EXECUTION AND CONSTRAINT MODELING

In this chapter, the difficulties of symbolic execution are analysed and assessed. Besides, the extended tools, the installation of symbolic execution techniques used in improving string constraint modeling technique are also presented. The modeling methods used in installation, evaluation of string constraint and mixed constraint solving methods are addressed in this chapter.

The contents of this chapter are introduced in published results of PhD student, including [CT1], [CT2],

[CT3]. 2.1. The difficulties and some solutions In this section, the difficulties in applying symbolic execution are discussed. Some solutions used to overcome

the challenges are presented. 2.1.1. Path exploration

One of the biggest challenges is path exploration. It means that a large number of paths is generated even though the tested program is small. The number of paths is often by a power function. In a certain perioid of time, the testing program has to discover the first path that is the most suitable. 2.1.2. Memory modeling

The accuracy of instructions in program when transferring to symbolic constraints affects significantly to the coverage of symbolic execution and the constraint sovling capability as well. For example, in order to establish the range of integer parameters, a memory modeling technique is more effective than the normal way. However, in this case, the accuracy of code analysis is worse. It depends on the selection of value range. For example, math memory over flow error can cause the lack of paths in symbolic execution or mining the impossible paths, etc.

2 .2. Symbolic execution and extended tools 2.2.1. Symbolic execution and software testing Test case automatic generating techniques used symbolic execution are installed on Java to generate test data for programs. However, the performance of available tools has their own disadvantages. Constraint optimization algorithm is applied in order to reduce the execution time and guarantee the effectiveness of coverage. This solution is implemented on different Jave programs and gets the good results. This section presents the aspects of Java PathFinder (JPF) including the checking ability of JPF, high level architecture of JPF and extension capability of JPF.

2.2.2. Symbolic execution on Java programming language Introduction to Java PathFinder (JPF)

Nowadays, symbolic execution techniques is performed by various tools on different programming languages such as DART[36], CUTE[35] on C/C++ source code or PEX[36, 37] on .NET source code. In this thesis, path finding is set up on Java because Java is a programming language that independs on environment. Java is also used in many important applications. Moreover, JPF is a powerful tool that support for symbolic execution. This leads to the widely extension ability and a huge developmental community of JPF.

7

JPF is a verification set of script state model for Java [68]. Not same as normal virtual machine, JPF is a virtual machine performing Java programs more than once. JPF executes on all branches, all possible paths of a program. JPF checks the range of attributes such as the exceptions that can not catch through potential execution paths.

Figure 2.1 describe the model of JPF

Figure 2.1. The model of JPF

Checking ability of JPF JPF can check all Java programs. It can find dead key or exception. Apart from that, JPF can develop

and extend by itself to test other attributes.

High level architecture of JPF

JPF is designed by two main components, including JVM and Search (as in Figure 2.3 below). In which, JVM is a Java typical state generation by executing Java bytecode statements. Search is used to choose the state that JVM should process, to direct JVM in generating the next state or to request JVM to go back a previous state.

8

Figure 2.3. High level architecture of JPF.

The main installation of Search includes Deep first search (DFSearch) and HeuristicSearch.

Extension ability of JPF

JPF is considered as a framework that each developer can extend it in order to support for a typical purpose. JPF supplies an extension mechanism for developers in order to allow to add new functions without changing directly the installation of Search or JVM

With this flexible extension structure, there are several extensions of JPF.

2.3. Constraint solving and symbolic execution Related to generating test cases, genetic algorithm approach has been studied in research work [CT[1] of

PhD student. However, this approach containts many limitations.

Although many improvements of constraint solving techniques have been proposed, the constraint solving is one of big obstacles of symbolic execution. Constraint solvers often take a high cost in run time. Thus, symbolic execution can not extend in some kind of program which the source code requires a high performance in constraint solving.

To overcome this challenge, the optimization step in constraint solving is necessary in symbolic execution. This is very effective in real programs. There are two optimization methods used in modern tools, including:

- Removing unrelated constraints - Constraint solving using incremental methods (Increamental Solving) [71]

In constraint sets that are generated in symbolic execution, the representation for a set of static branches from source code is one of the most important properties. In KLEE, all query results is stored in buffer mapping the set of constraints and the typical values corresponding to those constraints. The special case happens when there are no solution or no satisfying a special notation.

9 2.4. Mixed constraints and improvement in string constraint solving

In fact, string constraints appear in most of programs, especially in programs or procedures that clean strings in input data in order to test the validity, to discover and to remove harmful contents. String data in Jave is equipped by a huge library. String data conceals many errors.

To accomplish testing capability and error discovering ability in Java programs with string constraints, the thesis focuses to the progress of defining errors in order to get the better programs. By applying software testing on string constraints, the coverage ability increases and then the quality of software is also improved. Why does symbolic execution on string data meet difficulty?

The operators on string data are the mix of two value domains: string and integer. For example, we need to get the nth character in a string. In present, there are several solutions of symbolic execution on string data. But these solutions have just support for a subset of these operators. The iterative approach is chosen. First, we sovle the constraints on integer data. Then, the obtained results are used to solve the string constraints. If string constraints are satisfied, the task is complete. Otherwise, integer constraints are added into the constraint set. This procedure is repeated until there is no unsatisfied string constraint.

The main idea for proposing the extension of symbolic execution is presented as in Figure 2.7 below.

Figure 2.7. An example a string graph

The iterative progress is performed by two following steps: Step 1: Current solver of JPF is called to process integer constraints, real constraints, logic constraints,… If all these constraints is solved, the step 2 is performed. Step 2: + The typical values obtained from Step 1 are loaded in string graph. + The string solver is called to solve these constraints. In this case, string solver can be AutomataSolve or BitVectorSolve. The constraints of string graph may not satisfied with input values. Then, if there is enough time, Step 1 is run and string constraint set is added by a new integer constraint

1: 2: 3: 4: 5:

Solve(PathCondition pc){ StringGraph sg; sg=BuilStringGraph(pc); boolean sat=false; While(!sat and !timeout){ sat=CurrentSolver(pc,sg); if(sat) 6:

7: 8: sat=StringSolver(pc,sg); return sat; } }

10

1: 2: 3: 4: buildStringGraph(PathCondition pc) { StringGraph sg ; for string or mixed constraint c 2 pc: sg= sg hyperedge(c) return preprocess(pc; sg)

}

Figure 2.8. Mixed constraint solver algorithm

The limitation of time “timeout” is added in order to guarantee the combination of string solver and integer solver can not give the final conclusion in a allowable time. If it reaches to “timeout”, the algorithm stops, the given constraint is concluded as unsatisfied. 2.4.1. String graph

There are some types of constraints including string constraints, integer constraints, string and integer mixed constraints. In the installation progress, string constraints and mixed constraints are represented by special graph called as string graph.

Definition: String graph is a direct graph denoted by H=(X, E, Ʊ). In which, the vertex set is defined by X=IS where I and S in separated sets (I presents for integer variables and S presents for string variables). Ʊ presents for the operators on string variables. E presents for labeled direct edges in graph (E = E1 E2 E3 E4) satisfying En= Ʊ Xn. Using graph is a popular method in constraint satisfying programming. Figure 3.1 describe an example

for string graph corresponding to constraint:

s1.trim().equals(s2)^s1.equals(s3.concat(s2))

2.4.2. Constraint re-construction

Preparing for pre-processing progress, for each string constraint or mixed constraint of path constraint condition, considered as an edge of string graph, the graph re-constructs constraints by constraints and returned values of operators that are mapped directly into the edges of string graph.

For example, the constraint s1.equals(s2) is added to edge (equals, s1,s2) with other operators. The returned value can be a char, an integer or a string. An extra variable is created and represent for result of operator. This extra variable is added as a vertex of string graph. The graph allows to add an edge as a returned value.

In the example showed in Figure 2.7, each circle is a vertex of string graph. The dots and lines represent for edges in this graph. Each edge is labeled by the operator and the number on each edge shows the order of vertex in string graph corresponding to a constraint.

The string variables s1, s2 and s3 appear as the vertex, elements of S. In this graph, Ʊ = {trim, equals, concat}. Operator trim creates an extra variable s‟ added into S. The operator is the edge (trim, s1, s‟) is labeled by element trim of Ʊ. This edge connects s1 and s‟. The operator concat is the same. It is the second constraint that creates extra variable s” and edge (concat, s3, s2, s”). The operator equals corresponds to two last edges. When a string graph is completed, it is processed by two following ways:

1. 2. Define the constraints that have no satisfied values in a simple way. Add the values generated by integer solver in path constraint expression.

11 2.4.3. Pre-processing progress

When unsatisfied constraints are determined, execution progress will skip string constraint solving. Consequently, time consuming decreases. Eliminating equals edge from string graph is one of proposals. In above example, string graph is adjusted to new graph as shown in Figure 2.9.

Figure 2.9. String graph obtained by removing equals operators.

This also supports to define rapidly set of constraints represented in string graph that have no satisfied value. For example (shown in Figure 2.10), after removing edge in Figure 2.10.(a), vertex s1 connects to itself by notEquals edge. This connection is presented in Figure 2.10.(b). Then, there is not existing any string which satisfies this constraint.

Most of proposals are introduced to deal with the constant operators. For example, the constraint s1.concat(“xyz”) = ”vwxyz” will lead to the elimination of the complex edges. Then, vertex s1 is replaced by string “vw”. Another example, the constraint s1.startsWith(“abc”).chatAt(0) ≠ „a‟ is not suitable and it is easy to discover. If this situation appears in pre-processing stage, the use of string constraint solver is unnecessary and false value is returned.

Figure 2.10. Unsatisfied constraints after removing equals operators

2.4.4. String constraint generating and the implemental results

Before the constraints are generated, new vertices are added in string graph. For each vertex that is not a string constant SiS, an integer variable liI is added in order to represent for the length of string Si. Figure 3.4 shows that the new vertex of string graph in Figure 3.2 is connected to corresponding string vertices by dashes.

12

Figure 2.11. New vertices represent for the length of added string

Other constraints are based on the edges as shown in Table 2.1.

Table 2.1. Constraint construction for operators on string.

Java operators Heper-edge

For each constraint of vertex is not a string constant Si S Sa.charAt(n) Sa.concat(Sb) Sa.indexOf(Sb) Sa.lastIndexOf(Sb) Sa.subString(n,k) Sa.contains(Sb) charAt,Sa,n,c concat, Sa , Sb, Sx indexOf, Sa, Sb,x lastIndexOf, Sa, Sb,x subString,Sa,n,k,Sx contains,Sa,Sb New constraint li>0 la≥n lx=la+lb (x=-1)˅(la≥lb+x) (x=-1)˅(la≥lb+x) la=n+lx la≥lb

2.4.5. Constraint solving using automata

Constraint solvers is popular tools used to deal with many problems in reality such as theorem proving, time scheduling in real time. Symbolic execution in automatic generating test cases is used in research works CT[2], CT[3] by PhD student. Many researchers reduce the limitations of available constraint solvers in order to improve the applications in symbolic execution, especially in test case generating. Although there are various improvements, the constraint solvers are not effective with some certaint constraints.

In computer programs, string data appears frequently, especially in the program components or procedure. The Java programming language provides a huge library of string data types but still has many errors. In software testing, identifying bugs is concerned. In symbol execution, string constraints are also considered in most of cases. Deterministic Finite Automaton (DFA) is used to solve string constraints based on specific algorithms (CT[5]).

2.5. The conclusion of chapter 2

This chapter has presented the issues related to symbol execution and extension tools with an effective role in test case automatic generation. The main content related to symbol execution includes various symbol

13 execution techniques; advantages and disadvantages of these techniques and the processings that need to apply symbolic execution in software testing progress. Chapter 2 also addresses the difficulties in implementing the symbolic execution and several solutions are pointed out in this chapter. Constraint modeling approaches using string graphs, constructing constraints on string graphs, and preprocessing to detect unsatisfied constraints are presented in this chapter. The contents presented in chapter 2 are the research results from the works CT[1], CT[2], CT[3], CT[5].

CHAPTER 3. STRING CONSTRAINT SOLVING

In this chapter, the thesis presents a proposal to improve string constraint solving in symbol execution based on automata theory and apply it on string data type. Experimental results, evaluation of text data processing of the improved model with some existing models are also shown in this chapter. The contents of this chapter are published in [CT4], [CT5], [CT6] of PhD student.

3.1. String constraint solver

Many applications depend on the handling of a text data type. Since the use of the Internet has become common with the proliferation of interactive applications, this problem becomes more serious. Interactive applications include social media, chat, sensitive and important information management, personal information management.

The input data of the programs is used in SQL query statements and stored in database management systems. This allows users to access directly to the database. The reason is that the input data is open and public and it is also open to other objects. Some objects have bad intentions, so "cleaning" the text input is a requirement that appears mostly in web services.

The current methods of applying symbol execution on string code are divided into two groups: based on

automata [5, 53] and based on Bitvector [7]. 3.2. Bitvector and satisfiability modulo theories (SMT) 3.2.1. Satisfiability modulo theories (SMT)

The study of this thesis uses the SMT Z3 constraint solver [70] constructed by Microsoft. This study also considers to the use of CVC [25] but realizes that it is slower than Z3 because of using the SMT-lib2 standard for the bitvector constraint. The thesis also applies alternative symbols for addressing by byte instead of by bit. For example, a [0: 7] = 01100010 will be replaced with a [0] = 'b'. Constraint resolving by bitvector is beyond the scope of this study. 3.2.2. String constraint solving based on BitVector

Consider the example where the edge (contains, Sa, Sb) corresponds to the constraint Sa.contains(Sb), it is provided that the current estimate for the lengths of Sa and Sb is 5, 3 respectively, the constraint result for this edge is:

(Sa[0]=Sb[0])˄(Sa[1]=Sb[1]) ˄ (Sa[2]=Sb[2]) ∨((Sa[1]=Sb[0]) ˄ (Sa[2]=Sb[1]) ˄ (Sa[3]=Sb[2]) ∨((Sa[2]=Sb[0])˄(Sa[3]=Sb[1])˄(Sa[4]=Sb[2]) If the platform of constraint solver supports it can be simplified the array type,

to (Sa[0:2]=Sb)∨(Sa[1:3]=Sb) ∨(Sa[2:4]=Sb). The process of compiling the edges of the string graph into the BitVector constraints is given in the second column of Table 3.1

14

Table 3.1. Construction of constraint corresponding to string operators.

Edge Automation formula Bitvector constraint

(charAt,sa,n,x) (concat,sa,sb,sx) Sa[n]=x (sa=sx[la:0]) (sb=sx[lx:la])

(indexOf, sa,sb,x)

(lastIndexOf, sa,sb,x) (sa[x+lb:x]=sb) sa[i+lb:i] sb, i,0 i

Ma= Ma [n] {x} [*] Ma= Ma substring(Mx,0,la) Mb=Mb subString(Mx,lb, ) Mx=Mx (Ma Mb) Ma= Ma ([x] Mb) Mb [*] Mb=Mb substrings(Ma,x,x+lb) Ma= Ma [*] Mb ([x+lb] ] ([*] Mb [*])) Mb= Mb substrings(Ma,x,x+lb)

sa[n+lx:n]=sx substring(sa,n,sx) Ma= Ma [n] Mx

Mx= Mx substring(Ma,n,∞)

Sa[n+lx:n]=sx substring(sa,n,k,sx) Ma= Ma [n] Mx [*]

trim(sa,sx)

Mx= Mx substring(Ma,n,k) Ma= Ma [„˽‟,*] Mx [„˽‟,*] Mx= Mx trim(Ma)

(contains, sa,sb) (sx[0] ˽) sx[lx-1] „˽‟) (sa[j]= „˽‟ (sa[i+lx:i]=sx) (sa[k]= „˽‟), i,j,k |(0 i la- lx) (0 j i) (i+lx k

Ma= Ma [*] Mb [*] Mb= Mb substring(Ma,0,∞)

(endsWith, sa,sb) sa[la:la-lb]=sb

Ma= Ma [*] Mb Mb=suffises(Ma)

sa[lb:0]=sb (startsWith, sa,sb) Ma:= Ma Mb [*]

Mb:=prefixes(Ma)

3.3. String constraint solving based on automata

In this algorithm, when the edges in the processing of the automata are modified to reflect the effects of the constraints, the nature of the modification depends on the constraints. For example, edge (contains, sa,sb) corresponds to the constraint sa.contains(sb).

atomatonSolver(PathCondition pc;StringGraph StringGraph sg=(I S,E, Ʊ)) for( edges si S) Automaton Mi=[li]; W=E while W≠∅ W=W\e for(si connects to e) Update Mi based on constraint e; if(Mi=∅) pc=pc freshConstranints(pc:sg); 1: 2: 3: 4: 5: 6: 7: 8: 9:

15 10: 11: 12: 13

return(false,pc,sg); else if(Mi is modified) W=W {all edges connected to si}; return checkNegativeEdges(pc;sg;(M1,M2..)); Figure 3.1. String constraint solving based on automata

For negative constraints such as !equals, !constain, !startsWith and !endsWith, assumption that M1 and M2 are two automata corresponding to two string variables s1,s2 that have to solve the constraint s1.!equals(s2). Then, there are three following cases:

Case 1: Both automata guess only one word and then this constraint is satisfied when these two words

are not equal.

Case 2: There is only one automation that can guess a word. Then, we need to remove that word from the

other automation. In this case, the constraint will also be satisfied.

Case 3: In all other cases, the constraints are satisfied and no automation must be changed. The main part of the algorithm in Figure 3.6 describes formally the steps used for finding the solutions to the negative constraints.

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13 CheckNegativeEdges(pathcondition pc, StringGraph sg, (M1,M2..)) i=1; M‟1=M1; M‟2=M2;… M‟k=Mk; while(1≤i

Figure 3.2. Negative constraint solving procedure

Making an update to the constraints: If the string constraints are concluded as not satisfied, the new integer

constraints are added to the path constraint.

3.4. Proposing string constraint solving in symbolic constraint

The main steps to perform execution include:

Step 1: The path constraint condition will be converted to an intermediate format. Step 2: Solve or simplify the intermediate format. Step 3: Replace integer symbol variables with conjecture specific values. Step 4: Convert intermediate format to automata or Bitvector operators. Step 5: If constraint solving on the automata or bitvector fails, repeat Step 3 with other integer specific values in the set of best integer values.

16 Step 6: Constraint solving progress ends if a satisfactory string value is found or within a limited time, no new integer value can be found, and constraint solving progress on the automata and bitvector has no matching value. . 3.4.1 String constraint modeling using graph

The process of string constraint modeling using the graph is shown in the diagram below:

Figure 3.3. String constraint modeling using the graph diagram

The process is as follows:

In both cases above, the results are evaluated by using a set of suitable assessments for each of the

mentioned cases. The evaluator is used to compare the constraint solvers.

3.4.2 Discovering more string constraints in string data

For a symbolic execution model, in addition to a string variable, all variables include integer variables, which are assigned run-time values collected in dynamic symbol execution. This study is not modeling for mixed constraint solving (constraint solving for mixed string and integer data). However, in string constraint modeling, the execution progress extracts integer constraints on the length of the strings. This helps to find a satisfied length assignment. Thus, integer constraints are arised on string data. Most modern constraint solvers for integer data are efficient and accurate. Hence, generating integer constraints is not interfere with execution.

3.5. Experiment and result evaluation All experiments are conducted on a Windows 10 system with 8 GB RAM, using the open source tool

JavaPathFinder downloaded at

http://javapathfinder.sourceforge.net/ và phần mềm Eclipse Java EE IDE

17 Implementation 1: 2: 3: 4: 5: 6: 7: 8: 9: 11: 12: 13: 14:

SOLVE(PathCondition pc) StringGraph sg (pc,sg)←BuildStringGraph(pc); boolean sat←false; while (!sat˄!timeout) (sat,pc,sg) ←IntegerSolver(pc,sg) if(sat) (sat,pc,sg) ←StringSolver(pc,sg) if( pc unchanged): break; return sat BuildStringGraph(PathCondition pc) StringGraph sg:= ∅ for( String or mixed contraint c pc) sg:=sg Hyperedge(c) return Preprocess(pc,sg)

Figure 3.4: String constraint solving algorithm

The algorithm follows two these main steps Step 1: JPF constraint solver will be used for types of constraints on integer, real number, boolean, v..v.. Step 2: If Step 1 perform successfully, it will provide some specific values. These values will be loaded to the string graph (the associated values in the mixed constraint) and a string constraint solver is called to solve them.

1: 2: 3: 4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14: 15: 16: 17: 18: 19: 20: public static String s = "http://www./BigWeb"; private static boolean IsBigWebQuery(String str) { // (1) Check if after the last „/‟ there is //exist"BigWeb" int lastSlash = str.lastIndexOf('/'); if (lastSlash < 0) { return false; } String rest = str.substring(lastSlash + 1); if (!rest.contains("BigWeb")) { return false; } // (2) Check if string s begins with "http://" if (!str.startsWith("http://")) { return false; } //(3) Get a middle string of http://, the last //character of "/" and if it starts by “www”. //Remove them String t = str.substring("http://".length(), lastSlash); if (t.startsWith("www.")) {

18

t = t.substring("www.".length()); } //4) Check if after removing “www”, the string //still contains "other.com" or "google.com" if (!(t.equals("other.com")) && !(t.equals("google.com"))) { return false; } return true; } public static void doTest() { IsBigWebQuery(s); }

21: 22: 23: 24: 25: 26: 27: 28: 29: 30: 31: 32: 33: 34: 35: 36: 37: 38:

Figure 3.5. Testing Java program

19 3.5.2 Experimental results This section presents the results of the entire method as described above.

Figure 3.6. Result of generating test case on some operations of strings

20

Figure 3.7. code under test Starting with a Java program inferred from path conditions, a string graph is constructed and preprocessed. This graph is transformed to autotomata operators and Bitvector constraints. These constraints are solved. The conversions among constraint solvers are made (if necessary) with limited character sets in {„ ‟,‟a‟,‟b‟}.

Figure 3.9 shows a Java program that consists of a function with two string parameters. In line 1, there is a branch where the first string parameter can be preceded by a space. If the parameters are equal, line 2 creates a new string variable by removing whitespace at the ends (and this new variable will never satisfy if statements on line 1). Line 3, check if the new variable ends up with a string "ab". Otherwise, two parameters are check if not equal in line 7. Finally, the last branch is on line 12. This line checks if the fifth character of the first string variable equals the character „a‟.

21

Figure 3.9. A Java program

In this program, with two random strings, it would be very difficult to execute any one of the four branches of the program. Therefore, we have to come up with the best data to execute this program with all possible branches. To execute these branches, all the possible path constraints of the program is built. In this program, there are 12 path conditions but the study selects only two path conditions for illustrating. The first condition, the branch on line 1 is taken, followed by the branch on line 5. The branch on line 9 satisfies these possible data flows according to the constraints that must be met.

Table 3.2: Representation of constraints in edges of string graph in Figure 3.10

Constraint Edge

s1.startsWith(' ') s1.equals(s2 ) news1.endsWith('ab') s1.equals(s2) s1.indexOf (5) == ('a') (startsWith, s1,‟˽‟) (equals, s1, s2) (notEndsWith, s3,”ab”) (notEquals, s1, s2) (indexOf , s1,‟a‟,5) (trim, s1, s3)

The choosing this path condition leads to the construction the string graph shown in Figure 3.10 with the

edges given in the Table 3.2 above.

22

Figure 3.10. The string graph 1

Running the preprocessing tuble on the graph shows the constraint with values not satisfying the edge (equals,s1, s2 và notequals(s1,s2). Preprocessing tuble will conclude UNSAT and a new path condition is selected. Suppose that the next path condition are cases on lines 1, 3, 9 or s1 must be begun with a space and it has to equal to s2 after cutting the spaces at both ends. This string also must end with the string "ab" and have a 5th character as „a‟. Thus, it creates a constraint.

Table 3.3. Representation of constraints in edges of string graph in Figure 3.11

Constraint Edge

s1.startsWith(' ') s1.equals(s2 ) news1.endsWith('ab') s1.indexOf (5) == ('a') (startsWith, s1,‟˽‟) (equals, s1, s2) (endsWith, s3,”ab”) (indexOf , s1,‟a‟,5) (trim, s1, s3) This path condition will correspond to following string graph:

Figure 3.11. The string graph 2

Applying the preprocessor on above string graph will remove the vertex s2 from the graph and will

remember that the solution of s1 is also the solution of s2 when the solutions are reported back to the call.

The following integer constraints are added to prepare the integer constraint solver and will give its first solution. According to the approach of this study, it is called the best prediction (li is the length of the symbolic variable si):

l1>0 l2>0 l3>0 l1=l2 l3 2 l1 l3 l1 1 l1>5 Before starting the string constraint solver, the integer constraint solver is called to give the specific set of satisfied integers. Suppose that these values are l1=6, l2=6, l3=2. These values supply a suitable solution because they are the smallest values for which each variable will satisfy the constraint.

23

Considering the approach of first automata maps each string si representing the automation Mi. Each

automation is initialized to contain all words of length li. Thus M1=…… and M3=… Each edge of the graph is traversed.

i. ii.

iii.

iv.

startsWith(s1,‟˽‟) M1 is intersected with „˽‟.* and re-assign for M1. The new result of M1 is ˽….. trim(s1,s3) M1 is intersected with language of M3, in which M3 are the words with spaces at the beginning and the ending M1 ˽.* M3 ˽.*. Thus, language of M1 is ˽…… M3 will be intersected with language of M1 after removing all spaces at the beginning and the ending. As the result, language of M3 is [a-b],[b-a] indexOf(s1,‟a‟,5) M1 is intersected with language ….a.* and the result of intersection is assigned to M1 as ˽….a endsWith(s3,”ab”) M3 is intersected with language “ab”. The obtained language of M3 is “ab”. Similarly as above steps, after traversing edges, we have M1= ˽….a. and M3=ab. and the final result is

s1,s3 will be “˽ ˽ ˽ ˽ ˽ab” and s3 will be “ab”

By Bitvector approach, the best estimation l1=6, l2=6, l3=2 is used. Two Bitvector variables b1 and b3

i. ii.

iii.

iv. are generated corresponding to the length 6 and 2. Traversing each edge of graph we get: startsWith(s1,‟˽‟) creates the constraint as b1[0]= ‟˽‟. This constraint is added. trim(s1,s3) will add these constraints b1[i]=b3[0] b1[i+1]=b3[1] where i=0,i

After each edge, we push the constraints onto the Bitvetor constraint solving stack. It will be concluded as SAT or UNSAT. When the constraint endsWith is pushed on the stack, the result as UNSAT is obtained since b1 ends with „a 'and b3 ends with„ b ‟. When UNSAT is got from Bitvector solver and new constraints l1 6 l3 2 are added, the state of constraint solver is deleted and the transfer progress is restarted from the first edge. Some best predictions can be done. In this time, with l1=7 l2=7, l3=2, the conversion will be similar with previous interations. In fact, variable b1 will end with „b‟ and Bitvector solver will report as SAT. The satisfied assign statements are returned by s1, s3 as“˽ ˽ ˽ ˽ ˽ab” and s3 as “ab”.

3.6. Conclusion of chapter 3

This chapter of the thesis presents the proposals related to string constraints solving in symbol execution. From the concepts of string graph, string constraint solve based on the automata and BitVector methods. Simultaneously, this chapter also proposes a model used to string constraint solving based on automata. The analysis and evaluation obtained results are also presented in this chapter.

CONCLUSION

The main contributions of the thesis:

The thesis studies an overview of symbol execution techniques in automatic generating test cases. Based on that, the selection of the problem to improve the string constraint solver in order to increase the ability to solve the constraint and to apply in automatic generating test cases for string data. After the research process, the main contributions of the thesis include:

To model the constraints in the form of a string graph. Then, to simplify the constraint or detect constraints that do not have satisfied values in order to reduce the time to solve the string and mixed constraints.

24

To propose a constraint solving algorithm based on the results in the preprocessing step on two approaches: based on automata and based on Bitvector. From this result, the progress of automatic generating test data on a string data is introduced.

To implement the proposed algorithms with the extension of the Java Path Finder to install the constraint modeling using the string graph and the constraint solving on the Java programming language. Experimental results of these algorithms on a number of typical string operators are also presented. The future works: In order to continue the results of the thesis, the researches in the future can be done in the following

directions:

LIST OF THE PUBLISHED [CT1]. Hà Thị Thanh, Tô Hữu Nguyên, Nguyễn Hồng Tân, Nguyễn Văn Việt, Nguyễn Lan Oanh (2013), “Ứng

dụng giải thuật di truyền vào sinh testcase hiệu quả”, Tạp chí Khoa học và công nghệ - Đại học Thái Nguyên,

Tập 110, số 10, trang 131-135.

[CT2]. Tô Hữu Nguyên, Nguyễn Hồng Tân, Hà Thị Thanh, Đỗ Thanh Mai (2016), “Thực thi tượng trưng trong

sinh tự động dữ liệu kiểm thử phần mềm”, Tạp chí khoa học đại học Đà Lạt, Tập 6, Số 2, trang 258–178.

[CT3]. To Huu Nguyen, Tran Thi Ngan, Do Thanh Mai, Tran Manh Tuan (2016), “A novel symbolic execution

model in automated generation of test cases”, International Research Journal of Engineering and Technology

(IRJET), Volume: 03 Issue: 07. Page 1-7.

[CT4]. Tô Hữu Nguyên, Đỗ Thanh Mai, Nguyễn Hồng Tân(2016), “Kỹ thuật mô hình hóa ràng buộc hỗn hợp và

phương pháp giải ràng buộc xâu trong thực thi tượng trưng”, Tạp chí Khoa học và Công nghệ, Đại học Thái

Nguyên, 159(14), tr.141-147

[CT5]. To Huu Nguyen, Tran Thi Ngan (2020), Finding new integer constraints incorporated with string

constraints using automata in symbolic execution, International Research Journal of Engineering and

Technology (IRJET), Volume 7, Issue 6, pp. 4144 – 4148.

[CT6]. To Huu Nguyen, Nguyen Truong Thang, Dang Van Duc (2020), Mixed constraint solvers in symbolic

execution using multi-track automata, International Journal of Latest Engineering Science (IJLES), Volume: 03

Issue: 04

Summary of Mathematical Doctoral Thesis: Some improvements of string contraint solving in automated test cases generation for symbolic execution

MINISTRY OF EDUCATION AND

VIETNAM ACADEMY

TRAINING

OF SCIENCE AND TECHNOLOGY

GRADUATE UNIVERSITY SCIENCE AND TECHNOLOGY

----------------------------

To Huu Nguyen

SOME IMPROVEMENTS OF STRING CONTRAINT SOLVING IN AUTOMATED TEST CASES GENERATION FOR SYMBOLIC EXECUTION

Major: Mathematical foundations for Informatics

Code: 9 46 01 10

SUMMARY OF DOCTORAL THESIS

HaNoi - 2020

1

2

3

4

c

c

c

c

5

1.5.2. Static symbolic execution

6

in this chapter. The basic aspects, overview of symbolic execution methods and unresolved problems are presented.

7

8

9

2.4. Mixed constraints and improvement in string constraint solving

10

11

2.4.3. Pre-processing progress

12

13

14

15

10: 11: 12: 13

16

Step 6: Constraint solving progress ends if a satisfactory string value is found or within a limited time, no new integer value can be found, and constraint solving progress on the automata and bitvector has no matching value. . 3.4.1 String constraint modeling using graph

17

Implementation 1: 2: 3: 4: 5: 6: 7: 8: 9: 11: 12: 13: 14:

18

19

3.5.2 Experimental results This section presents the results of the entire method as described above.

20

21

22

23

24

Có thể bạn quan tâm

Improvements of directed automated random testing in test data generation for C++ projects

Doctoral Dissertation Mathematics: DC algorithms in nonconvex quadratic programming and applications in data clustering

Mathematics Doctoral Thesis: Fuzzy clustering techniques for remote sensing image analysis

Doctoral Thesis: Developing deep neural networks for network attack detection

Summary of Mathematical Doctoral Thesis: Some improvements of string contraint solving in automated test cases generation for symbolic execution

Summary of doctoral thesis: Some improvements of string contraint solving in automated test cases generation for symbolic execution

The thesis for the degree of doctor of Philosophy in mathematics: Improving some artificial immune algorithms for network intrusion detection

Summary of the doctoral thesis of Mathematics: Improving some artificial immune algorithms for network intrusion detection

Tài liêu mới

Luận án Tiến sĩ: Nghiên cứu xây dựng thuật toán thích nghi và học tăng cường cấu trúc Actor - Critic điều khiển bám quỹ đạo cho robot di động đa hướng mecanum

Luận án Tiến sĩ: Cơ cấu bệnh tim mạch và chất lượng cuộc sống của người cao tuổi mắc suy tim, rung nhĩ điều trị tại Bệnh viện Thống Nhất, thành phố Hồ Chí Minh

Tóm tắt Luận án Tiến sĩ: Nghiên cứu hiện tượng nứt dăm đê sông vùng đồng bằng sông Hồng và dự báo khả năng bị nứt của một số đoạn đê

Tóm tắt Luận án Tiến sĩ: Nghiên cứu xây dựng giải pháp đảm bảo an toàn thông tin cho quá trình học liên kết dựa trên mật mã

Tóm tắt Luận án Tiến sĩ: Phát triển năng lực đánh giá công nghệ cho học sinh trong dạy học môn Công nghệ 11 ở trường trung học phổ thông

Tóm tắt Luận án Tiến sĩ: Nghiên cứu phân loại chi cầu diệp – Bulbophyllum Thouars (Orchidaceae) ở vùng Tây Nguyên bằng phương pháp hình thái và phân tử

Tóm tắt Luận án Tiến sĩ: Nghiên cứu đặc điểm phân bố và dinh dưỡng của các loài lưỡng cư ở Vườn Quốc gia Bến En và Khu bảo tồn thiên nhiên Pù Luông, tỉnh Thanh Hóa

Luận án Tiến sĩ: Tổng hợp luật dẫn và điều khiển cho một lớp tên lửa đối hải trên cơ sở ứng dụng mạng nơ ron và hệ mờ

Luận án Tiến sĩ: Nghiên cứu tổng hợp hệ điều khiển góc Pitch tua bin gió trong điều kiện có nhiễu tác động

Luận án Tiến sĩ: Nghiên cứu hóa học lipid của hai loài san hô thủy tức Millepora dichotoma và Millepora platyphylla ở Việt Nam

Luận án Tiến sĩ: Nghiên cứu kiểm soát phân phối công suất kéo trên cầu chủ động của ô tô con bằng ABS

Luận án Tiến sĩ: Ứng dụng phản ứng Domino vào tổng hợp các dẫn xuất Podophyllotoxin, Pyrimidine và đánh giá hoạt tính sinh học của các chất tổng hợp được

Luận án Tiến sĩ: Nghiên cứu thành phần hóa học và một số hoạt tính sinh học của cây chùm ngây (Moringa oleifera)

Tóm tắt Luận án Tiến sĩ: Nghiên cứu thành phần hóa học và ứng dụng ức chế ăn mòn cho thép của cao chiết xuất từ cây Lộc vừng thuộc họ Lecythidaceae

Luận án Tiến sĩ: Nghiên cứu hiện tượng nứt dăm đê sông vùng đồng bằng sông Hồng và dự báo khả năng bị nứt của một số đoạn đê

AI tóm tắt

Giới thiệu tài liệu

Đối tượng sử dụng

Từ khoá chính

Nội dung tóm tắt

Giới thiệu