Bucketing Failing Tests via Symbolic Analysis

A common problem encountered while debugging programs is the overwhelming number of test cases generated by automated test generation tools, where many of the tests are likely to fail due to same bug. Some coarse-grained clustering techniques based on point of failure PFB and stack hash CSB have been proposed to address the problem. In this work, we propose a new symbolic analysis-based clustering algorithm that uses the semantic reason behind failures to group failing tests into more "meaningful" clusters. We implement our algorithm within the KLEE symbolic execution engine; our experiments on 21 programs drawn from multiple benchmark-suites show that our technique is effective at producing more fine grained clusters as compared to the FSB and CSB clustering schemes. As a side-effect, our technique also provides a semantic characterization of the fault represented by each cluster--a precious hint to guide debugging. A user study conducted among senior undergraduates and masters students further confirms the utility of our test clustering method.

[1]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[2]  Gregg Rothermel,et al.  Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact , 2005, Empirical Software Engineering.

[3]  Chao Liu,et al.  Failure proximity: a fault localization-based approach , 2006, SIGSOFT '06/FSE-14.

[4]  Per Runeson,et al.  Detection of Duplicate Defect Reports Using Natural Language Processing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[5]  Guy M. Lohman,et al.  Automatically Identifying Known Software Problems , 2007, 2007 IEEE 23rd International Conference on Data Engineering Workshop.

[6]  Dawson R. Engler,et al.  KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[7]  David A. Wagner,et al.  Dynamic Test Generation to Find Integer Bugs in x86 Binary Linux Programs , 2009, USENIX Security Symposium.

[8]  George Candea,et al.  S2E: a platform for in-vivo multi-path analysis of software systems , 2011, ASPLOS XVI.

[9]  Debugging in the (very) large , 2011, Commun. ACM.

[10]  Nachiappan Nagappan,et al.  Crash graphs: An aggregated view of multiple crashes to improve crash triage , 2011, 2011 IEEE/IFIP 41st International Conference on Dependable Systems & Networks (DSN).

[11]  Michael Hicks,et al.  Automating object transformations for dynamic software updating , 2012, OOPSLA '12.

[12]  Cacm Staff,et al.  BufferBloat , 2011, Communications of the ACM.

[13]  Dongmei Zhang,et al.  ReBucket: A method for clustering duplicate crash reports based on call stack similarity , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[14]  SAGE: whitebox fuzzing for security testing , 2012, Commun. ACM.

[15]  Alessandro Orso,et al.  F3: fault localization for field failures , 2013, ISSTA.

[16]  George Candea,et al.  Efficient state merging in symbolic execution , 2012, Software Engineering.

[17]  David Brumley,et al.  Enhancing symbolic execution with veritesting , 2014, ICSE.

[18]  Yuriy Brun,et al.  The ManyBugs and IntroClass Benchmarks for Automated Repair of C Programs , 2015, IEEE Transactions on Software Engineering.

[19]  Andreas Podelski,et al.  Classifying Bugs with Interpolants , 2016, TAP@STAF.