CarFast: achieving higher statement coverage faster

Test coverage is an important metric of software quality, since it indicates thoroughness of testing. In industry, test coverage is often measured as statement coverage. A fundamental problem of software testing is how to achieve higher statement coverage faster, and it is a difficult problem since it requires testers to cleverly find input data that can steer execution sooner toward sections of application code that contain more statements. We created a novel fully automatic approach for aChieving higher stAtement coveRage FASTer (CarFast), which we implemented and evaluated on twelve generated Java applications whose sizes range from 300 LOC to one million LOC. We compared CarFast with several popular test case generation techniques, including pure random, adaptive random, and Directed Automated Random Testing (DART). Our results indicate with strong statistical significance that when execution time is measured in terms of the number of runs of the application on different input test data, CarFast outperforms the evaluated competitive approaches on most subject applications.

[1]  Lori A. Clarke,et al.  A System to Generate Test Data and Symbolically Execute Programs , 1976, IEEE Transactions on Software Engineering.

[2]  Akbar Siami Namin,et al.  The influence of size and coverage on test suite effectiveness , 2009, ISSTA.

[3]  Yves Le Traon,et al.  Testing Security Policies: Going Beyond Functional Testing , 2007, The 18th IEEE International Symposium on Software Reliability (ISSRE '07).

[4]  Lionel C. Briand,et al.  Formal analysis of the effectiveness and predictability of random testing , 2010, ISSTA '10.

[5]  Brian Marick How to Misuse Code Coverage , 1999 .

[6]  Paul Piwowarski,et al.  Coverage measurement experience during function test , 1993, Proceedings of 1993 15th International Conference on Software Engineering.

[7]  Tao He,et al.  Applying Software Reliability Models on Security Incidents , 2007, The 18th IEEE International Symposium on Software Reliability (ISSRE '07).

[8]  A. Jefferson Offutt,et al.  Combination testing strategies: a survey , 2005, Softw. Test. Verification Reliab..

[9]  Richard Torkar,et al.  A survey on testing and reuse , 2003, Proceedings 2003 Symposium on Security and Privacy.

[10]  James M. Bieman,et al.  Software reliability growth with test coverage , 2002, IEEE Trans. Reliab..

[11]  Chengkai Li,et al.  Dynamic symbolic database application testing , 2010, DBTest '10.

[12]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[13]  Nikolai Tillmann,et al.  Fitness-guided path exploration in dynamic symbolic execution , 2009, 2009 IEEE/IFIP International Conference on Dependable Systems & Networks.

[14]  A. Ziv,et al.  Off-The-Shelf Vs . Custom Made Coverage Models , Which Is The One for You ? , 1998 .

[15]  Koushik Sen DART: Directed Automated Random Testing , 2009, Haifa Verification Conference.

[16]  Tsong Yueh Chen,et al.  Adaptive random testing through dynamic partitioning , 2004 .

[17]  Myra B. Cohen,et al.  Constructing test suites for interaction testing , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[18]  Laurie Hendren,et al.  Dynamic metrics for java , 2003, OOPSLA 2003.

[19]  Carlos Urias Munoz,et al.  Automatic Generation of Random Self-Checking Test Cases , 1983, IBM Syst. J..

[20]  Qian Yang,et al.  A survey of coverage based testing tools , 2006, AST '06.

[21]  Lionel C. Briand,et al.  Adaptive random testing: an illusion of effectiveness? , 2011, ISSTA '11.

[22]  Gregg Rothermel,et al.  Test Case Prioritization: A Family of Empirical Studies , 2002, IEEE Trans. Software Eng..

[23]  Martin Maierhofer,et al.  Local Stack Allocation , 1998, CC.

[24]  Thierry Coupaye,et al.  ASM: a code manipulation tool to implement adaptable systems , 2002 .

[25]  Amer Diwan,et al.  The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.

[26]  Sara Cohen,et al.  Querying parse trees of stochastic context-free grammars , 2010, ICDT '10.

[27]  Eitan Farchi,et al.  Applications of synchronization coverage , 2005, PPoPP.

[28]  A. Jefferson Offutt,et al.  Constraint-Based Automatic Test Data Generation , 1991, IEEE Trans. Software Eng..

[29]  Yong Woo Kim,et al.  Efficient use of code coverage in large-scale software development , 2003, CASCON.

[30]  Chen Fu,et al.  Evaluating program analysis and testing tools with the RUGRAT random benchmark application generator , 2012, WODA 2012.

[31]  Alan Jay Smith,et al.  Analysis of benchmark characteristics and benchmark performance prediction , 1996, TOCS.

[32]  Jeff Yu Lei,et al.  Practical Combinatorial Testing: Beyond Pairwise , 2008, IT Professional.

[33]  Sergey Fomel,et al.  Guest Editors' Introduction: Reproducible Research , 2009, Comput. Sci. Eng..

[34]  Kenneth Koster,et al.  State coverage: a structural test adequacy criterion for behavior checking , 2007, ESEC-FSE '07.

[35]  Michael R. Lyu,et al.  An empirical study of the correlation between code coverage and reliability estimation , 1996, Proceedings of the 3rd International Software Metrics Symposium.

[36]  Akbar Siami Namin,et al.  The use of mutation in testing experiments and its sensitivity to external threats , 2011, ISSTA '11.

[37]  Per Runeson,et al.  A survey of unit testing practices , 2006, IEEE Software.

[38]  Geoff A. Cohen,et al.  Automatic Program Transformation with JOIE , 1998, USENIX Annual Technical Conference.

[39]  R. Mark Sirkin,et al.  Statistics for the Social Sciences , 1994 .

[40]  Christoph Csallner,et al.  Dsc+Mock: a test case + mock class generator in support of coding against interfaces , 2010, WODA '10.

[41]  Lionel C. Briand,et al.  A practical guide for using statistical tests to assess randomized algorithms in software engineering , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[42]  Steven S. Muchnick,et al.  Advanced Compiler Design and Implementation , 1997 .

[43]  Dick Hamlet When only random testing will do , 2006, RT '06.

[44]  Patrice Godefroid,et al.  Automated Whitebox Fuzz Testing , 2008, NDSS.

[45]  Katherine A. Yelick,et al.  A performance analysis of the Berkeley UPC compiler , 2003, ICS '03.

[46]  R. Hamlet RANDOM TESTING , 1994 .

[47]  Yves Le Traon,et al.  Testing Security Policies: Going Beyond Functional Testing , 2007, The 18th IEEE International Symposium on Software Reliability (ISSRE '07).

[48]  George McDaniel IBM dictionary of computing , 1994 .

[49]  Koushik Sen,et al.  Heuristics for Scalable Dynamic Test Generation , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[50]  Patrice Godefroid,et al.  Compositional dynamic test generation , 2007, POPL '07.

[51]  Laurie Hendren,et al.  Soot---a java optimization framework , 1999 .

[52]  Michael R. Lyu,et al.  The effect of code coverage on fault detection under different testing profiles , 2005, ACM SIGSOFT Softw. Eng. Notes.

[53]  Cem Kaner,et al.  Lessons Learned in Software Testing , 2001 .

[54]  Bertrand Meyer,et al.  ARTOO: adaptive random testing for object-oriented software , 2008, ICSE.

[55]  Rupak Majumdar,et al.  Hybrid Concolic Testing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[56]  Donald R. Slutz,et al.  Massive Stochastic Testing of SQL , 1998, VLDB.

[57]  Amer Diwan,et al.  Wake up and smell the coffee: evaluation methodology for the 21st century , 2008, CACM.

[58]  Lieven Eeckhout,et al.  Distilling the essence of proprietary workloads into miniature benchmarks , 2008, TACO.

[59]  Barton P. Miller,et al.  An empirical study of the robustness of Windows NT applications using random testing , 2000 .

[60]  Matthias Schwab,et al.  Making scientific computations reproducible , 2000, Comput. Sci. Eng..

[61]  Karama Kanoun,et al.  Dependability benchmarking for computer systems , 2008 .

[62]  Hong Zhu,et al.  Software unit test coverage and adequacy , 1997, ACM Comput. Surv..