:: End of manually writing unit tests. Why not generate test cases automatically and smartly?

CUTE :: A Concolic Unit Testing Engine for C and Java

Open System Laboratory | Computer Science Department | University of Illinois at Urbana Champaign
Case Studies

Case Studies

We illustrate two case studies that show how CUTE can detect errors. In the second case study, we also present results that show how CUTE achieves branch coverage of the code under test. We performed all experiments on a Linux machine with a dual 1.7 GHz Intel Xeon processor.

Data Structures of CUTE

We applied CUTE to test its own data structures. CUTE uses a number of non-standard data structures at runtime, such as cu_linear to represent linear expressions, cu_pointer to represent pointer expressions, cu_depend to represent dependency graphs for path constraints etc. Our goal in this case study was to detect memory leaks in addition to standard errors such as segmentation faults, assertion violation etc. To that end, we used CUTE in conjunction with valgrind [3]. We discovered a few memory leaks and a couple of segmentation faults that did not show up in other uses of CUTE. This case study is interesting in that we applied CUTE to partly unit test itself and discovered bugs. We briefly describe our experience with testing the cu_linear data structure.
We tested the cu_linear module of CUTE in the depth-first search mode of CUTE along with valgrind. In 537 iterations, CUTE found a memory leak. The following is a snippet of the function cu_linear_add relevant for the memory leak:
 
cu_linear *
cu_linear_add(cu_linear *c1, cu_linear *c2, int add) {
  int i, j;
  cu_linear* ret=(cu_linear*)malloc(sizeof(cu_linear));
  ... // skipped 18 lines of code
  if(ret->count==0) return NULL;

If the sum of the two linear expressions passed as arguments becomes constant, the function returns NULL without freeing the memory allocated for the local variable ret. CUTE constructed this scenario automatically at the time of testing. Specifically, CUTE constructed the sequence of function calls l1=cu_linear_create(0); l1=cu_linear_create(0); l1=cu_linear_negate(l1); l1=cu_linear_add(l1,l2,1); that exposes the memory leak that valgrind detects.

SGLIB Library

We also applied CUTE to unit test SGLIB [2] version 1.0.1, a popular, open-source C library for generic data structures. The library has been extensively used to implement the commercial tool Xrefactory. SGLIB consists of a single C header file, sglib.h, with about 2000 lines of code consisting only of C macros. This file provides generic implementation of most common algorithms for arrays, lists, sorted lists, doubly linked lists, hash tables, and red-black trees. Using the SGLIB macros, a user can declare and define various operations on data structures of parametric types.
The library and its sample examples provide verifier functions (can be used as repOk) for each data structure except for hash tables. We used these verifier functions to test the library using the technique of repOk mentioned in  [1]. For hash tables, we invoked a sequence of its function. We used CUTE with bounded depth-first search strategy with bound 50. Figure 1 shows the results of our experiments.
We chose SGLIB as a case study primarily to measure the efficiency of CUTE. As SGLIB is widely used, we did not expect to find bugs. Much to our surprise, we found two bugs in SGLIB using CUTE.
The first bug is a segmentation fault that occurs in the doubly-linked-list library when a non-zero length list is concatenated with another zero-length list. CUTE discovered the bug in 140 iterations (about 1 seconds) in the bounded depth-first search mode. This bug is easy to fix by putting a check on the length of the second list in the concatenation function.
The second bug, which is a more serious one, was found by CUTE in the hash table library in 193 iterations (in 1 second). Specifically, CUTE constructed the following valid sequence of function calls which gets the library into an infinite loop:
typedef struct ilist { int i; struct ilist *next; } ilist;
ilist *htab[10];
main() {
  struct ilist *e,*e1,*e2,*m;
  sglib_hashed_ilist_init(htab);
  e=(ilist *)malloc(sizeof(ilist)); e->next = 0; e->i=0;
  sglib_hashed_ilist_add_if_not_member(htab,e,&m);
  sglib_hashed_ilist_add(htab,e);
  e2=(ilist *)malloc(sizeof(ilist)); e2->next = 0; e2->i=0;
  sglib_hashed_ilist_is_member(htab,e2); 
}

where ilist is a struct representing an element of the hash table. We reported these bugs to the SGLIB developers, who confirmed that these are indeed bugs.
NameRun time# of # of Branches% Branch # of FunctionsOPT 1 OPT 2# of Bugs
in secondsIterationsExploredCoverage Tested in %& 3 in %Found
Array Quick Sort2 732 43 97.73 2 67.80 49.13 0
Array Heap Sort4 1764 36 100.00 2 71.10 46.38 0
Linked List2 570 100 96.15 12 86.93 88.09 0
Sorted List2 1020 110 96.49 11 88.86 80.85 0
Doubly Linked List3 1317 224 99.12 17 86.95 79.38 1
Hash Table 1 193 46 85.19 8 97.01 52.94 1
Red Black Tree2629 1,000,000 242 71.18 17 89.65 64.93 0
Figure 1: Results for testing SGLIB 1.0.1 with bounded depth-first strategy with depth 50
Figure 1 shows the results for testing SGLIB 1.0.1 with the bounded depth-first strategy. For each data structure and array sorting algorithm that SGLIB implements, we tabulate the time that CUTE took to test the data structure, the number of runs that CUTE made, the number of branches it executed, branch coverage obtained, the number of functions executed, the benefit of optimizations, and the number of bugs found.
The branch coverage in most cases is less than 100%. After investigating the reason for this, we found that the code contains a number of assert statements that were never violated and a number of predicates that are redundant and can be removed from the conditionals.
The last two columns in Figure 1 show the benefit of the three optimizations discussed in[1]. The column OPT 1 gives the average percentage of executions in which the fast unsatisfiability check was successful. It is important to note that the saving in the number of satisfiability checks translates into an even higher relative saving in the satisfiability-checking time because lp_solve takes much more time (exponential in number of constraints) to determine that a set of constraints is unsatisfiable than to generate a solution when one exists. For example, for red-black trees and depth-first search, OPT 1 was successful in almost 90% of executions, which means that OPT 1 reduces the number of calls to lp_solve an order of magnitude. However, OPT 1 reduces the solving time of lp_solve more than two orders of magnitude in this case; in other words, it would be infeasible to run CUTE without OPT 1. The column OPT 2 & 3 gives the average percentage of constraints that CUTE eliminated in each execution due to common sub-expression elimination and incremental solving optimizations. Yet again, this reduction in the size of constraint set translates into a much higher relative reduction in the solving time.

References

[1]
K. Sen, D. Marinov, and G. Agha. CUTE: A concolic unit testing engine for C. In 5th joint meeting of the European Software Engineering Conference and ACM SIGSOFT Symposium on the Foundations of Software Engineering (ESEC/FSE'05). ACM, 2005.
[2]
SGLIB. http://xref-tech.com/sglib/main.html.
[3]
Valgrind. http://valgrind.org/.



File translated from TEX by TTH, version 3.71.
On 28 Jan 2006, 15:22.
Copyright © 2006 University of Illinois at Urbana Champaign. All rights Reserved