MINISTRY OF EDUCATION AND
VIETNAM ACADEMY
TRAINING
OF SCIENCE AND TECHNOLOGY
GRADUATE UNIVERSITY SCIENCE AND TECHNOLOGY
----------------------------
To Huu Nguyen
SOME IMPROVEMENTS OF STRING CONTRAINT SOLVING IN AUTOMATED TEST CASES GENERATION FOR SYMBOLIC EXECUTION
Major: Mathematical foundations for Informatics
Code: 9 46 01 10
SUMMARY OF DOCTORAL THESIS
HaNoi - 2020
This thesis is completed at: Graduate University of Science and Technology, Vietnam Academy
of Science and Technology
Supervisors 1: Dr. NGUYEN TRUONG THANG Supervisors 2: Assoc.Prof.Dr. DANG VAN DUC
Reviewer 1: ……………………………………………………………………… ……………………………………………………………………… Reviewer 2: ……………………………………………………………………… ……………………………………………………………………… Reviewer 3: ……………………………………………………………………… ……………………………………………………………………… The thesis is to be presented to the Defense Committee of the Graduate University of Science and Technology – Vietnam Academy of Science and Technology At Date Month Year 2020
The thesis can be found at:
- Vietnam National Library
- Library of Graduate University of Science and Technology
1
INTRODUCTION
In general, for each technique used in dynamic run-time testing, any software can be separated into two stages: preparing the test cases for software testing and performing the program on a background that supports to testing frameworks for available test cases.
One of important activities in order to reduce testing cost is to generate test cases automatically and completely. The software developing organizations often pay a large amount for activities related to software testing. The effectiveness of verification and validation progress depends on the number of errors that has been found and fixed before transferring the software. It means that the quality of a software depends strictly on the quality of generated test cases.
In past years, many researches from international scientists focused to “automatic test case generating” [3, 4] in order to reduce the cost of software developing. There are two main approaches in this topic including testing based on code and testing based on model. Testing based on code has a high coverage. This approach can remove unnecessary instructions that may cause the errors. This approach is also optimizing the computations of testing analysis softwares. Thus, many recent researches concentrate to testing based on code [5, 6, 7].
There are also many researches in automatic test case generating such as test case generating based on specification, test case generating based on model, test case generating by path approach and smart technique. However, testing based on symbolic execution is an attractive research strategy. The basic test case generating techniques were proposed including test case generating by theorem proving [8, 9], test case generating by symbolic execution [5, 7,10,11,12], test case generating by model checking [13], test case generating by an event-flow model [14], test case generating using a Markov chains model [15]. In which, symbolic execution gets the attention of many researchers. Symbolic execution is used to develop and construct many applications [5, 6, 7] on various tools and programming languages such as C/C++, JavaScrip, .NET, java, HTML, etc.
In this thesis, some improvements in string constraint solvers applied in test case generation are studied.
The experiments and analysis are evaluated using Java, a strong, modern programming language.
The research objective:
- To study the methods used to modeling constraints and improve constraint solving capability. - To apply symbolic execution technique in automatic test case generating. - To implement proposed methods in automatic test case generating on string constraints and mixed
constraints.
- To analyze and evaluate the obtained results.
Object and scope of the research:
This thesis aims to research on overview of automatic test case generating, symbolic execution applied into automatic test case generating. The modeling techniques based on Automata and Bitvector are also studied in this thesis. Apart from that, the analysis and evaluation of available test case generating methods on different constraint are mentioned. The quality and effectiveness of test cases generated by using symbolic execution are assessed. Content of the research: - The automatic test case generating techniques, the contents related to constraint modeling on different
types of constraints. Improvements of constraint modeling and constraint solving on string constraints.
- - Applying these improvements in symbolic execution to generate test cases automatically on string
constraints for testing programs.
Research methodology:
2
-
- Study, analyze, synthesize the documents related to symbolic execution, the role of constraint solving and string constraint solvers in symbolic execution to generate test cases automatically (papers from journals, conferences, etc). Improve the constraint modeling methods and constraint solving capability on string constraints and mixed constraints.
- Analyze and evaluate the published research results.
The main contributions:
Improved the capability of constraint solving in symbolic execution. Implemented constraint modeling techniques and constraint sovling based on automata and bitvector.
- Constructed a constraint modeling on string and mixed constraints. - - - Compared the obtained results with available researches.
The structure of the thesis:
The structure of this thesis is organized as below Chapter 1 presents overview of software tesing and symbolic execution techniques in automatic test case generation. The basic theory related to contents of thesis is also given in this chapter. Based on mentioned theory, the research problem and research approaches are stated.
Chapter 2 shows the results of researches on constraint modeling, constraint solving in symbolic execution. The applications of these tools in typical situations and the evaluation of these methods on typical data types.
Chapter 3 introduces the results of researches of improvements in constraint modeling, the constraint solving in symbolic execution on string constraints and mixed constraints. An extension of symbolic execution technique and the string constraint solving based on automata are also presented in this chapter.
The conclusions of this thesis address the main contributions, the future works and the list of references are
presented in the thesis.
3
CHAPTER 1. OVERVIEW OF SOFTWARE TESTING
AND SYMBOLIC EXECUTION
This chapter presents the overview of fundamental aspects of software testing, symbolic execution and applications of symbolic execution in generating test cases. The contents of this chapter mainly focus on basic concepts and analysis of symbolic execution. Moreover, some challenges and future strategies are also presented in this chapter.
1.1. Software testing
Ensuring a software system or a component of a software system work as expected is a big challenge in software industry. The mistake softwares cause the loss of economy and other serious effects on the fields that the softwares are used.
Figure 1.8. Relation between software development and software testing
1.1.2 Testing techniques [24]: - Black box testing - White box testing - Unit test
1.2. Control flow white box testing technique
Control flow white box testing is divided into static and dynamic testing. The input of these two strategies are the source code of program and testing coverage criteria. After analyzing progress, the output is a set of test cases. Each strategy has the advantages and disadvantages. The objective of control flow white box testing technique is to find the minimum set of test cases that gets the highest coverage.
4
c
c
c
c
sequentially if…else do….while while….do for
Figure 1.6. Common control structures
1.3. The comparison of static and dynamic control flow white box testing technique
The comparison of these static and dynamic control flow white box testing technique in term of the advantages and disadvantages is presented in the thesis. The number of test cases, the run time needed in test case generating, the loop test capability and the effects of the complexity in source code are used in comparison. 1.4. The challenges in software testing
There are some the wrong ideas in software testing such as a good test case is always the high complexity; automatic testing can replace software engineers in order to test the software well; software testing is a simple and easy work; everyone can test a software without any training.
Software testing meets other limitations. The correction of software specification, the accuracy of tesing system are not guaranteed. There is no testing tool that is suitable for every software. Software engineers can not sure about the understand of software product. The resource for complete testing is not enough. We can not perform the testing activities completely.
1.5. Symbolic execution 1.5.1. Overview of symbolic execution
In computer science, symbolic execution is a tool used in analysing a program in order to define the input of each component in that component. A compiler of a program get a symbolic value as input instead of a crispt input as usual.
def f(x,y); if( x>y); x=x+y y=x-y x=x-y if (x-y>0); assert false return (x, y)
Figure 1.7. Example of a concrete idea of a symbolic execution
5
1.5.2. Static symbolic execution
The main idea of symbolic execution [31] is to execute the program with symbolic values instead of
typical vales of input parameters.
Symbolic values of variable x can be represented by: (a). An input notation. (b). A combined expression among symbolic values using operators. (c). A combined expression between symbolic values and typical values using operators.
Figure 1.8. Symbolic execution tree
1.5.3. Dynamic symbolic execution
Dynamic symbolic execution [32, 33] is symbolic execution technique based on dynamic program analysis. Dynamic symbolic execution is the combination of typical execution and symbolic execution. In dynamic symbolic execution, the program is executed many times with different values of input parameters.
Table 1.1. An example of a dynamic symbolicexecution.
Constraints are remembered Contrains C’(i) Input i C(i)
null a = = null
a!=null {} a!=null && !(a.length>0)
a!=null && a.length>0 a!=null&& a.length>0 {0} && a[0]!=123456789
a!=null && a.length>0 && a!=null && a.length>0 {123456789} && a[0]!=123456789 a[0]!=123456789
1.5.4 Concolic execution Concolic execution is a typical performance model. Concolic execution is a modified version of symbolic
execution. DART [36] and CUTE [35] are two tools of concolic execution. 1.6. The conclusion of chapter 1
Chapter 1 of the thesis presents the fundamental concepts of software testing techniques, basic techniques of automatic test case generating from the models. Apart from that, the comparison of static white box testing with dynamic white box testing is also stated. Moreover, the challenges of software testing is given
6
in this chapter. The basic aspects, overview of symbolic execution methods and unresolved problems are presented.
CHAPTER 2. SYMBOLIC EXECUTION AND CONSTRAINT MODELING
In this chapter, the difficulties of symbolic execution are analysed and assessed. Besides, the extended tools, the installation of symbolic execution techniques used in improving string constraint modeling technique are also presented. The modeling methods used in installation, evaluation of string constraint and mixed constraint solving methods are addressed in this chapter.
The contents of this chapter are introduced in published results of PhD student, including [CT1], [CT2],
[CT3]. 2.1. The difficulties and some solutions In this section, the difficulties in applying symbolic execution are discussed. Some solutions used to overcome
the challenges are presented. 2.1.1. Path exploration
One of the biggest challenges is path exploration. It means that a large number of paths is generated even though the tested program is small. The number of paths is often by a power function. In a certain perioid of time, the testing program has to discover the first path that is the most suitable. 2.1.2. Memory modeling
The accuracy of instructions in program when transferring to symbolic constraints affects significantly to the coverage of symbolic execution and the constraint sovling capability as well. For example, in order to establish the range of integer parameters, a memory modeling technique is more effective than the normal way. However, in this case, the accuracy of code analysis is worse. It depends on the selection of value range. For example, math memory over flow error can cause the lack of paths in symbolic execution or mining the impossible paths, etc.
2 .2. Symbolic execution and extended tools 2.2.1. Symbolic execution and software testing Test case automatic generating techniques used symbolic execution are installed on Java to generate test data for programs. However, the performance of available tools has their own disadvantages. Constraint optimization algorithm is applied in order to reduce the execution time and guarantee the effectiveness of coverage. This solution is implemented on different Jave programs and gets the good results. This section presents the aspects of Java PathFinder (JPF) including the checking ability of JPF, high level architecture of JPF and extension capability of JPF.
2.2.2. Symbolic execution on Java programming language Introduction to Java PathFinder (JPF)
Nowadays, symbolic execution techniques is performed by various tools on different programming languages such as DART[36], CUTE[35] on C/C++ source code or PEX[36, 37] on .NET source code. In this thesis, path finding is set up on Java because Java is a programming language that independs on environment. Java is also used in many important applications. Moreover, JPF is a powerful tool that support for symbolic execution. This leads to the widely extension ability and a huge developmental community of JPF.
7
JPF is a verification set of script state model for Java [68]. Not same as normal virtual machine, JPF is a virtual machine performing Java programs more than once. JPF executes on all branches, all possible paths of a program. JPF checks the range of attributes such as the exceptions that can not catch through potential execution paths.
Figure 2.1 describe the model of JPF
Figure 2.1. The model of JPF
Checking ability of JPF JPF can check all Java programs. It can find dead key or exception. Apart from that, JPF can develop
and extend by itself to test other attributes.
High level architecture of JPF
JPF is designed by two main components, including JVM and Search (as in Figure 2.3 below). In which, JVM is a Java typical state generation by executing Java bytecode statements. Search is used to choose the state that JVM should process, to direct JVM in generating the next state or to request JVM to go back a previous state.
8
Figure 2.3. High level architecture of JPF.
The main installation of Search includes Deep first search (DFSearch) and HeuristicSearch.
Extension ability of JPF
JPF is considered as a framework that each developer can extend it in order to support for a typical purpose. JPF supplies an extension mechanism for developers in order to allow to add new functions without changing directly the installation of Search or JVM
With this flexible extension structure, there are several extensions of JPF.
2.3. Constraint solving and symbolic execution Related to generating test cases, genetic algorithm approach has been studied in research work [CT[1] of
PhD student. However, this approach containts many limitations.
Although many improvements of constraint solving techniques have been proposed, the constraint solving is one of big obstacles of symbolic execution. Constraint solvers often take a high cost in run time. Thus, symbolic execution can not extend in some kind of program which the source code requires a high performance in constraint solving.
To overcome this challenge, the optimization step in constraint solving is necessary in symbolic execution. This is very effective in real programs. There are two optimization methods used in modern tools, including:
- Removing unrelated constraints - Constraint solving using incremental methods (Increamental Solving) [71]
In constraint sets that are generated in symbolic execution, the representation for a set of static branches from source code is one of the most important properties. In KLEE, all query results is stored in buffer mapping the set of constraints and the typical values corresponding to those constraints. The special case happens when there are no solution or no satisfying a special notation.
9
2.4. Mixed constraints and improvement in string constraint solving
In fact, string constraints appear in most of programs, especially in programs or procedures that clean strings in input data in order to test the validity, to discover and to remove harmful contents. String data in Jave is equipped by a huge library. String data conceals many errors.
To accomplish testing capability and error discovering ability in Java programs with string constraints, the thesis focuses to the progress of defining errors in order to get the better programs. By applying software testing on string constraints, the coverage ability increases and then the quality of software is also improved. Why does symbolic execution on string data meet difficulty?
The operators on string data are the mix of two value domains: string and integer. For example, we need to get the nth character in a string. In present, there are several solutions of symbolic execution on string data. But these solutions have just support for a subset of these operators. The iterative approach is chosen. First, we sovle the constraints on integer data. Then, the obtained results are used to solve the string constraints. If string constraints are satisfied, the task is complete. Otherwise, integer constraints are added into the constraint set. This procedure is repeated until there is no unsatisfied string constraint.
The main idea for proposing the extension of symbolic execution is presented as in Figure 2.7 below.
Figure 2.7. An example a string graph
-
-
The iterative progress is performed by two following steps: Step 1: Current solver of JPF is called to process integer constraints, real constraints, logic constraints,… If all these constraints is solved, the step 2 is performed. Step 2: + The typical values obtained from Step 1 are loaded in string graph. + The string solver is called to solve these constraints. In this case, string solver can be AutomataSolve or BitVectorSolve. The constraints of string graph may not satisfied with input values. Then, if there is enough time, Step 1 is run and string constraint set is added by a new integer constraint
1: 2: 3: 4: 5:
Solve(PathCondition pc){ StringGraph sg; sg=BuilStringGraph(pc); boolean sat=false; While(!sat and !timeout){ sat=CurrentSolver(pc,sg); if(sat) 6:
7: 8: sat=StringSolver(pc,sg); return sat; } }
10
1: 2: 3: 4: buildStringGraph(PathCondition pc) { StringGraph sg ; for string or mixed constraint c 2 pc: sg= sg hyperedge(c) return preprocess(pc; sg)
}
Figure 2.8. Mixed constraint solver algorithm
The limitation of time “timeout” is added in order to guarantee the combination of string solver and integer solver can not give the final conclusion in a allowable time. If it reaches to “timeout”, the algorithm stops, the given constraint is concluded as unsatisfied. 2.4.1. String graph
There are some types of constraints including string constraints, integer constraints, string and integer mixed constraints. In the installation progress, string constraints and mixed constraints are represented by special graph called as string graph.
Definition: String graph is a direct graph denoted by H=(X, E, Ʊ). In which, the vertex set is defined by X=IS where I and S in separated sets (I presents for integer variables and S presents for string variables). Ʊ presents for the operators on string variables. E presents for labeled direct edges in graph (E = E1 E2 E3 E4) satisfying En= Ʊ Xn. Using graph is a popular method in constraint satisfying programming. Figure 3.1 describe an example
for string graph corresponding to constraint:
s1.trim().equals(s2)^s1.equals(s3.concat(s2))
2.4.2. Constraint re-construction
Preparing for pre-processing progress, for each string constraint or mixed constraint of path constraint condition, considered as an edge of string graph, the graph re-constructs constraints by constraints and returned values of operators that are mapped directly into the edges of string graph.
For example, the constraint s1.equals(s2) is added to edge (equals, s1,s2) with other operators. The returned value can be a char, an integer or a string. An extra variable is created and represent for result of operator. This extra variable is added as a vertex of string graph. The graph allows to add an edge as a returned value.
In the example showed in Figure 2.7, each circle is a vertex of string graph. The dots and lines represent for edges in this graph. Each edge is labeled by the operator and the number on each edge shows the order of vertex in string graph corresponding to a constraint.
The string variables s1, s2 and s3 appear as the vertex, elements of S. In this graph, Ʊ = {trim, equals, concat}. Operator trim creates an extra variable s‟ added into S. The operator is the edge (trim, s1, s‟) is labeled by element trim of Ʊ. This edge connects s1 and s‟. The operator concat is the same. It is the second constraint that creates extra variable s” and edge (concat, s3, s2, s”). The operator equals corresponds to two last edges. When a string graph is completed, it is processed by two following ways:
1. 2. Define the constraints that have no satisfied values in a simple way. Add the values generated by integer solver in path constraint expression.
11
2.4.3. Pre-processing progress
When unsatisfied constraints are determined, execution progress will skip string constraint solving. Consequently, time consuming decreases. Eliminating equals edge from string graph is one of proposals. In above example, string graph is adjusted to new graph as shown in Figure 2.9.
Figure 2.9. String graph obtained by removing equals operators.
This also supports to define rapidly set of constraints represented in string graph that have no satisfied value. For example (shown in Figure 2.10), after removing edge in Figure 2.10.(a), vertex s1 connects to itself by notEquals edge. This connection is presented in Figure 2.10.(b). Then, there is not existing any string which satisfies this constraint.
Most of proposals are introduced to deal with the constant operators. For example, the constraint s1.concat(“xyz”) = ”vwxyz” will lead to the elimination of the complex edges. Then, vertex s1 is replaced by string “vw”. Another example, the constraint s1.startsWith(“abc”).chatAt(0) ≠ „a‟ is not suitable and it is easy to discover. If this situation appears in pre-processing stage, the use of string constraint solver is unnecessary and false value is returned.
Figure 2.10. Unsatisfied constraints after removing equals operators
2.4.4. String constraint generating and the implemental results
Before the constraints are generated, new vertices are added in string graph. For each vertex that is not a string constant SiS, an integer variable liI is added in order to represent for the length of string Si. Figure 3.4 shows that the new vertex of string graph in Figure 3.2 is connected to corresponding string vertices by dashes.
12
Figure 2.11. New vertices represent for the length of added string
Other constraints are based on the edges as shown in Table 2.1.
Table 2.1. Constraint construction for operators on string.
Java operators Heper-edge
For each constraint of vertex is not a string constant Si S Sa.charAt(n) Sa.concat(Sb) Sa.indexOf(Sb) Sa.lastIndexOf(Sb) Sa.subString(n,k) Sa.contains(Sb) charAt,Sa,n,c concat, Sa , Sb, Sx indexOf, Sa, Sb,x lastIndexOf, Sa, Sb,x subString,Sa,n,k,Sx contains,Sa,Sb New constraint li>0 la≥n lx=la+lb (x=-1)˅(la≥lb+x) (x=-1)˅(la≥lb+x) la=n+lx la≥lb
2.4.5. Constraint solving using automata
Constraint solvers is popular tools used to deal with many problems in reality such as theorem proving, time scheduling in real time. Symbolic execution in automatic generating test cases is used in research works CT[2], CT[3] by PhD student. Many researchers reduce the limitations of available constraint solvers in order to improve the applications in symbolic execution, especially in test case generating. Although there are various improvements, the constraint solvers are not effective with some certaint constraints.
In computer programs, string data appears frequently, especially in the program components or procedure. The Java programming language provides a huge library of string data types but still has many errors. In software testing, identifying bugs is concerned. In symbol execution, string constraints are also considered in most of cases. Deterministic Finite Automaton (DFA) is used to solve string constraints based on specific algorithms (CT[5]).
2.5. The conclusion of chapter 2
This chapter has presented the issues related to symbol execution and extension tools with an effective role in test case automatic generation. The main content related to symbol execution includes various symbol
13
execution techniques; advantages and disadvantages of these techniques and the processings that need to apply symbolic execution in software testing progress. Chapter 2 also addresses the difficulties in implementing the symbolic execution and several solutions are pointed out in this chapter. Constraint modeling approaches using string graphs, constructing constraints on string graphs, and preprocessing to detect unsatisfied constraints are presented in this chapter. The contents presented in chapter 2 are the research results from the works CT[1], CT[2], CT[3], CT[5].
CHAPTER 3. STRING CONSTRAINT SOLVING
In this chapter, the thesis presents a proposal to improve string constraint solving in symbol execution based on automata theory and apply it on string data type. Experimental results, evaluation of text data processing of the improved model with some existing models are also shown in this chapter. The contents of this chapter are published in [CT4], [CT5], [CT6] of PhD student.
3.1. String constraint solver
Many applications depend on the handling of a text data type. Since the use of the Internet has become common with the proliferation of interactive applications, this problem becomes more serious. Interactive applications include social media, chat, sensitive and important information management, personal information management.
The input data of the programs is used in SQL query statements and stored in database management systems. This allows users to access directly to the database. The reason is that the input data is open and public and it is also open to other objects. Some objects have bad intentions, so "cleaning" the text input is a requirement that appears mostly in web services.
The current methods of applying symbol execution on string code are divided into two groups: based on
automata [5, 53] and based on Bitvector [7]. 3.2. Bitvector and satisfiability modulo theories (SMT) 3.2.1. Satisfiability modulo theories (SMT)
The study of this thesis uses the SMT Z3 constraint solver [70] constructed by Microsoft. This study also considers to the use of CVC [25] but realizes that it is slower than Z3 because of using the SMT-lib2 standard for the bitvector constraint. The thesis also applies alternative symbols for addressing by byte instead of by bit. For example, a [0: 7] = 01100010 will be replaced with a [0] = 'b'. Constraint resolving by bitvector is beyond the scope of this study. 3.2.2. String constraint solving based on BitVector
Consider the example where the edge (contains, Sa, Sb) corresponds to the constraint Sa.contains(Sb), it is provided that the current estimate for the lengths of Sa and Sb is 5, 3 respectively, the constraint result for this edge is:
(Sa[0]=Sb[0])˄(Sa[1]=Sb[1]) ˄ (Sa[2]=Sb[2]) ∨((Sa[1]=Sb[0]) ˄ (Sa[2]=Sb[1]) ˄ (Sa[3]=Sb[2]) ∨((Sa[2]=Sb[0])˄(Sa[3]=Sb[1])˄(Sa[4]=Sb[2]) If the platform of constraint solver supports it can be simplified the array type,
to (Sa[0:2]=Sb)∨(Sa[1:3]=Sb) ∨(Sa[2:4]=Sb). The process of compiling the edges of the string graph into the BitVector constraints is given in the second column of Table 3.1
14
Table 3.1. Construction of constraint corresponding to string operators.
Edge Automation formula Bitvector constraint
(charAt,sa,n,x) (concat,sa,sb,sx) Sa[n]=x (sa=sx[la:0]) (sb=sx[lx:la])
(indexOf, sa,sb,x)
(lastIndexOf,
sa,sb,x) (sa[x+lb:x]=sb)
sa[i+lb:i] sb, i,0 i Ma= Ma [n] {x} [*]
Ma= Ma substring(Mx,0,la)
Mb=Mb subString(Mx,lb, )
Mx=Mx (Ma Mb)
Ma= Ma ([x] Mb) Mb [*]
Mb=Mb substrings(Ma,x,x+lb)
Ma= Ma [*] Mb ([x+lb]
] ([*] Mb [*]))
Mb= Mb substrings(Ma,x,x+lb) sa[n+lx:n]=sx substring(sa,n,sx) Ma= Ma [n] Mx Mx= Mx substring(Ma,n,∞) Sa[n+lx:n]=sx substring(sa,n,k,sx) Ma= Ma [n] Mx [*] trim(sa,sx) Mx= Mx substring(Ma,n,k)
Ma= Ma [„˽‟,*] Mx [„˽‟,*]
Mx= Mx trim(Ma) (contains, sa,sb) (sx[0] ˽) sx[lx-1] „˽‟)
(sa[j]=
„˽‟ (sa[i+lx:i]=sx) (sa[k]=
„˽‟), i,j,k |(0 i la-
lx) (0 j i) (i+lx k Ma= Ma [*] Mb [*]
Mb= Mb substring(Ma,0,∞) (endsWith, sa,sb) sa[la:la-lb]=sb Ma= Ma [*] Mb
Mb=suffises(Ma) sa[lb:0]=sb (startsWith, sa,sb) Ma:= Ma Mb [*] Mb:=prefixes(Ma) 3.3. String constraint solving based on automata In this algorithm, when the edges in the processing of the automata are modified to reflect the effects of
the constraints, the nature of the modification depends on the constraints. For example, edge (contains, sa,sb)
corresponds to the constraint sa.contains(sb). atomatonSolver(PathCondition pc;StringGraph StringGraph sg=(I S,E, Ʊ))
for( edges si S)
Automaton Mi=[li];
W=E
while W≠∅
W=W\e
for(si connects to e)
Update Mi based on constraint e;
if(Mi=∅)
pc=pc freshConstranints(pc:sg); 1:
2:
3:
4:
5:
6:
7:
8:
9: return(false,pc,sg);
else if(Mi is modified)
W=W {all edges connected to si};
return checkNegativeEdges(pc;sg;(M1,M2..)); Figure 3.1. String constraint solving based on automata For negative constraints such as !equals, !constain, !startsWith and !endsWith, assumption that M1 and
M2 are two automata corresponding to two string variables s1,s2 that have to solve the constraint s1.!equals(s2).
Then, there are three following cases: Case 1: Both automata guess only one word and then this constraint is satisfied when these two words are not equal. Case 2: There is only one automation that can guess a word. Then, we need to remove that word from the other automation. In this case, the constraint will also be satisfied. Case 3: In all other cases, the constraints are satisfied and no automation must be changed. The main
part of the algorithm in Figure 3.6 describes formally the steps used for finding the solutions to the negative
constraints. 1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13 CheckNegativeEdges(pathcondition pc, StringGraph sg, (M1,M2..))
i=1; M‟1=M1; M‟2=M2;… M‟k=Mk;
while(1≤i Figure 3.2. Negative constraint solving procedure Making an update to the constraints: If the string constraints are concluded as not satisfied, the new integer constraints are added to the path constraint. 3.4. Proposing string constraint solving in symbolic constraint The main steps to perform execution include: Step 1: The path constraint condition will be converted to an intermediate format. Step 2: Solve or simplify the
intermediate format.
Step 3: Replace integer symbol variables with conjecture specific values.
Step 4: Convert intermediate format to automata or Bitvector operators.
Step 5: If constraint solving on the automata or bitvector fails, repeat Step 3 with other integer specific values in
the set of best integer values. The process of string constraint modeling using the graph is shown in the diagram below: Figure 3.3. String constraint modeling using the graph diagram The process is as follows: In both cases above, the results are evaluated by using a set of suitable assessments for each of the mentioned cases. The evaluator is used to compare the constraint solvers. 3.4.2 Discovering more string constraints in string data For a symbolic execution model, in addition to a string variable, all variables include integer variables,
which are assigned run-time values collected in dynamic symbol execution. This study is not modeling for
mixed constraint solving (constraint solving for mixed string and integer data). However, in string constraint
modeling, the execution progress extracts integer constraints on the length of the strings. This helps to find a
satisfied length assignment. Thus, integer constraints are arised on string data. Most modern constraint solvers
for integer data are efficient and accurate. Hence, generating integer constraints is not interfere with execution. 3.5. Experiment and result evaluation All experiments are conducted on a Windows 10 system with 8 GB RAM, using the open source tool JavaPathFinder downloaded at http://javapathfinder.sourceforge.net/ và phần mềm Eclipse Java EE IDE SOLVE(PathCondition pc)
StringGraph sg
(pc,sg)←BuildStringGraph(pc);
boolean sat←false;
while (!sat˄!timeout)
(sat,pc,sg) ←IntegerSolver(pc,sg)
if(sat)
(sat,pc,sg) ←StringSolver(pc,sg)
if( pc unchanged): break;
return sat
BuildStringGraph(PathCondition pc)
StringGraph sg:= ∅
for( String or mixed contraint c pc)
sg:=sg Hyperedge(c)
return Preprocess(pc,sg) Figure 3.4: String constraint solving algorithm The algorithm follows two these main steps
Step 1: JPF constraint solver will be used for types of constraints on integer, real number, boolean, v..v..
Step 2: If Step 1 perform successfully, it will provide some specific values. These values will be loaded to the
string graph (the associated values in the mixed constraint) and a string constraint solver is called to solve them. 1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20: public static String s = "http://www./BigWeb";
private static boolean IsBigWebQuery(String str) {
// (1) Check if after the last „/‟ there is //exist"BigWeb"
int lastSlash = str.lastIndexOf('/');
if (lastSlash < 0) {
return false;
}
String rest = str.substring(lastSlash + 1);
if (!rest.contains("BigWeb")) {
return false;
}
// (2) Check if string s begins with "http://"
if (!str.startsWith("http://")) {
return false;
}
//(3) Get a middle string of http://, the last //character of "/" and if it starts by “www”. //Remove
them
String t =
str.substring("http://".length(), lastSlash);
if (t.startsWith("www.")) { t = t.substring("www.".length());
}
//4) Check if after removing “www”, the string //still contains "other.com" or "google.com"
if (!(t.equals("other.com")) && !(t.equals("google.com"))) {
return false;
}
return true;
}
public static void doTest() {
IsBigWebQuery(s);
} 21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38: Figure 3.5. Testing Java program Figure 3.6. Result of generating test case on some operations of strings Figure 3.7. code under test
Starting with a Java program inferred from path conditions, a string graph is constructed and
preprocessed. This graph is transformed to autotomata operators and Bitvector constraints. These constraints are
solved. The conversions among constraint solvers are made (if necessary) with limited character sets in
{„ ‟,‟a‟,‟b‟}. Figure 3.9 shows a Java program that consists of a function with two string parameters. In line 1, there is
a branch where the first string parameter can be preceded by a space. If the parameters are equal, line 2 creates a
new string variable by removing whitespace at the ends (and this new variable will never satisfy if statements on
line 1). Line 3, check if the new variable ends up with a string "ab". Otherwise, two parameters are check if not
equal in line 7. Finally, the last branch is on line 12. This line checks if the fifth character of the first string
variable equals the character „a‟. Figure 3.9. A Java program In this program, with two random strings, it would be very difficult to execute any one of the four
branches of the program. Therefore, we have to come up with the best data to execute this program with all
possible branches. To execute these branches, all the possible path constraints of the program is built. In this
program, there are 12 path conditions but the study selects only two path conditions for illustrating. The first
condition, the branch on line 1 is taken, followed by the branch on line 5. The branch on line 9 satisfies these
possible data flows according to the constraints that must be met. Table 3.2: Representation of constraints in edges of string graph in Figure 3.10 Constraint Edge s1.startsWith(' ')
s1.equals(s2 )
news1.endsWith('ab')
s1.equals(s2)
s1.indexOf (5) == ('a') (startsWith, s1,‟˽‟)
(equals, s1, s2)
(notEndsWith, s3,”ab”)
(notEquals, s1, s2)
(indexOf , s1,‟a‟,5)
(trim, s1, s3) The choosing this path condition leads to the construction the string graph shown in Figure 3.10 with the edges given in the Table 3.2 above. Figure 3.10. The string graph 1 Running the preprocessing tuble on the graph shows the constraint with values not satisfying the edge
(equals,s1, s2 và notequals(s1,s2). Preprocessing tuble will conclude UNSAT and a new path condition is selected.
Suppose that the next path condition are cases on lines 1, 3, 9 or s1 must be begun with a space and it has
to equal to s2 after cutting the spaces at both ends. This string also must end with the string "ab" and have a 5th
character as „a‟. Thus, it creates a constraint. Table 3.3. Representation of constraints in edges of string graph in Figure 3.11 Constraint Edge s1.startsWith(' ')
s1.equals(s2 )
news1.endsWith('ab')
s1.indexOf (5) == ('a') (startsWith, s1,‟˽‟)
(equals, s1, s2)
(endsWith, s3,”ab”)
(indexOf , s1,‟a‟,5)
(trim, s1, s3) This path condition will correspond to following string graph: Figure 3.11. The string graph 2 Applying the preprocessor on above string graph will remove the vertex s2 from the graph and will remember that the solution of s1 is also the solution of s2 when the solutions are reported back to the call. The following integer constraints are added to prepare the integer constraint solver and will give its first
solution. According to the approach of this study, it is called the best prediction (li is the length of the symbolic
variable si): l1>0 l2>0 l3>0 l1=l2 l3 2 l1 l3 l1 1 l1>5
Before starting the string constraint solver, the integer constraint solver is called to give the specific set
of satisfied integers. Suppose that these values are l1=6, l2=6, l3=2. These values supply a suitable solution
because they are the smallest values for which each variable will satisfy the constraint. Considering the approach of first automata maps each string si representing the automation Mi. Each automation is initialized to contain all words of length li. Thus M1=…… and M3=… Each edge of the graph is traversed. i.
ii. iii. iv. startsWith(s1,‟˽‟) M1 is intersected with „˽‟.* and re-assign for M1. The new result of M1 is ˽…..
trim(s1,s3) M1 is intersected with language of M3, in which M3 are the words with spaces at the
beginning and the ending M1 ˽.* M3 ˽.*. Thus, language of M1 is ˽…… M3 will be intersected
with language of M1 after removing all spaces at the beginning and the ending. As the result,
language of M3 is [a-b],[b-a]
indexOf(s1,‟a‟,5) M1 is intersected with language ….a.* and the result of intersection is assigned to
M1 as ˽….a
endsWith(s3,”ab”) M3 is intersected with language “ab”. The obtained language of M3 is “ab”.
Similarly as above steps, after traversing edges, we have M1= ˽….a. and M3=ab. and the final result is s1,s3 will be “˽ ˽ ˽ ˽ ˽ab” and s3 will be “ab” By Bitvector approach, the best estimation l1=6, l2=6, l3=2 is used. Two Bitvector variables b1 and b3 i.
ii. iii. iv. are generated corresponding to the length 6 and 2. Traversing each edge of graph we get:
startsWith(s1,‟˽‟) creates the constraint as b1[0]= ‟˽‟. This constraint is added.
trim(s1,s3) will add these constraints b1[i]=b3[0] b1[i+1]=b3[1] where i=0,i After each edge, we push the constraints onto the Bitvetor constraint solving stack. It will be concluded
as SAT or UNSAT. When the constraint endsWith is pushed on the stack, the result as UNSAT is obtained since
b1 ends with „a 'and b3 ends with„ b ‟. When UNSAT is got from Bitvector solver and new constraints l1 6
l3 2 are added, the state of constraint solver is deleted and the transfer progress is restarted from the first edge.
Some best predictions can be done. In this time, with l1=7 l2=7, l3=2, the conversion will be similar with
previous interations. In fact, variable b1 will end with „b‟ and Bitvector solver will report as SAT. The satisfied
assign statements are returned by s1, s3 as“˽ ˽ ˽ ˽ ˽ab” and s3 as “ab”. 3.6. Conclusion of chapter 3 This chapter of the thesis presents the proposals related to string constraints solving in symbol execution.
From the concepts of string graph, string constraint solve based on the automata and BitVector methods.
Simultaneously, this chapter also proposes a model used to string constraint solving based on automata. The
analysis and evaluation obtained results are also presented in this chapter. CONCLUSION The main contributions of the thesis: The thesis studies an overview of symbol execution techniques in automatic generating test cases. Based
on that, the selection of the problem to improve the string constraint solver in order to increase the ability to
solve the constraint and to apply in automatic generating test cases for string data. After the research process, the
main contributions of the thesis include: To model the constraints in the form of a string graph. Then, to simplify the constraint or detect
constraints that do not have satisfied values in order to reduce the time to solve the string and mixed constraints. To propose a constraint solving algorithm based on the results in the preprocessing step on two
approaches: based on automata and based on Bitvector. From this result, the progress of automatic generating
test data on a string data is introduced. To implement the proposed algorithms with the extension of the Java Path Finder to install the constraint
modeling using the string graph and the constraint solving on the Java programming language. Experimental
results of these algorithms on a number of typical string operators are also presented.
The future works: In order to continue the results of the thesis, the researches in the future can be done in the following directions: LIST OF THE PUBLISHED
[CT1]. Hà Thị Thanh, Tô Hữu Nguyên, Nguyễn Hồng Tân, Nguyễn Văn Việt, Nguyễn Lan Oanh (2013), “Ứng dụng giải thuật di truyền vào sinh testcase hiệu quả”, Tạp chí Khoa học và công nghệ - Đại học Thái Nguyên, Tập 110, số 10, trang 131-135. [CT2]. Tô Hữu Nguyên, Nguyễn Hồng Tân, Hà Thị Thanh, Đỗ Thanh Mai (2016), “Thực thi tượng trưng trong sinh tự động dữ liệu kiểm thử phần mềm”, Tạp chí khoa học đại học Đà Lạt, Tập 6, Số 2, trang 258–178. [CT3]. To Huu Nguyen, Tran Thi Ngan, Do Thanh Mai, Tran Manh Tuan (2016), “A novel symbolic execution model in automated generation of test cases”, International Research Journal of Engineering and Technology (IRJET), Volume: 03 Issue: 07. Page 1-7. [CT4]. Tô Hữu Nguyên, Đỗ Thanh Mai, Nguyễn Hồng Tân(2016), “Kỹ thuật mô hình hóa ràng buộc hỗn hợp và phương pháp giải ràng buộc xâu trong thực thi tượng trưng”, Tạp chí Khoa học và Công nghệ, Đại học Thái Nguyên, 159(14), tr.141-147 [CT5]. To Huu Nguyen, Tran Thi Ngan (2020), Finding new integer constraints incorporated with string constraints using automata in symbolic execution, International Research Journal of Engineering and Technology (IRJET), Volume 7, Issue 6, pp. 4144 – 4148. [CT6]. To Huu Nguyen, Nguyen Truong Thang, Dang Van Duc (2020), Mixed constraint solvers in symbolic execution using multi-track automata, International Journal of Latest Engineering Science (IJLES), Volume: 03 Issue: 04 .15
10:
11:
12:
13
16
Step 6: Constraint solving progress ends if a satisfactory string value is found or within a limited time, no new
integer value can be found, and constraint solving progress on the automata and bitvector has no matching value.
.
3.4.1 String constraint modeling using graph
17
Implementation
1:
2:
3:
4:
5:
6:
7:
8:
9:
11:
12:
13:
14:
18
19
3.5.2 Experimental results
This section presents the results of the entire method as described above.
20
21
22
23
24