Hunting for Bugs in Code Coverage Tools via Randomized Differential Testing

Reliable code coverage tools are critically important as it is heavily used to facilitate many quality assurance activities, such as software testing, fuzzing, and debugging. However, little attention has been devoted to assessing the reliability of code coverage tools. In this study, we propose a randomized differential testing approach to hunting for bugs in the most widely used C code coverage tools. Specifically, by generating random input programs, our approach seeks for inconsistencies in code coverage reports produced by different code coverage tools, and then identifies inconsistencies as potential code coverage bugs. To effectively report code coverage bugs, we addressed three specific challenges: (1) How to filter out duplicate test programs as many of them triggering the same bugs in code coverage tools; (2) how to automatically reduce large test programs to much smaller ones that have the same properties; and (3) how to determine which code coverage tools have bugs? The extensive evaluations validate the effectiveness of our approach, resulting in 42 and 28 confirmed/fixed bugs for gcov and llvm-cov, respectively. This case study indicates that code coverage tools are not as reliable as it might have been envisaged. It not only demonstrates the effectiveness of our approach, but also highlights the need to continue improving the reliability of code coverage tools. This work opens up a new direction in code coverage validation which calls for more attention in this area.

[1]  Joseph Robert Horgan,et al.  A study of effective regression testing in practice , 1997, Proceedings The Eighth International Symposium on Software Reliability Engineering.

[2]  Clifford J. Maloney,et al.  Systematic mistake analysis of digital computer programs , 1963, CACM.

[3]  Gregg Rothermel,et al.  A Unified Test Case Prioritization Approach , 2014, ACM Trans. Softw. Eng. Methodol..

[4]  Mary Jean Harrold,et al.  Test-suite reduction and prioritization for modified condition/decision coverage , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[5]  Zhendong Su,et al.  Finding compiler bugs via live code mutation , 2016, OOPSLA.

[6]  Koushik Sen,et al.  FairFuzz: A Targeted Mutation Strategy for Increasing Greybox Fuzz Testing Coverage , 2018, 2018 33rd IEEE/ACM International Conference on Automated Software Engineering (ASE).

[7]  Xuejun Yang,et al.  Test-case reduction for C compiler bugs , 2012, PLDI.

[8]  Huai Liu,et al.  Metamorphic Testing , 2018, ACM Comput. Surv..

[9]  Zhendong Su,et al.  Randomized stress-testing of link-time optimizers , 2015, ISSTA.

[10]  Baowen Xu,et al.  A theoretical analysis of the risk evaluation formulas for spectrum-based fault localization , 2013, TSEM.

[11]  Mark Harman,et al.  Search Algorithms for Regression Test Case Prioritization , 2007, IEEE Transactions on Software Engineering.

[12]  Tao Xie,et al.  To Be Optimal or Not in Test-Case Prioritization , 2016, IEEE Transactions on Software Engineering.

[13]  W. M. McKeeman,et al.  Differential Testing for Software , 1998, Digit. Tech. J..

[14]  Siau-Cheng Khoo,et al.  Mining Dataflow Sensitive Specifications , 2013, ICFEM.

[15]  Martin Monperrus,et al.  Test case purification for improving fault localization , 2014, SIGSOFT FSE.

[16]  Alex Groce,et al.  Taming compiler fuzzers , 2013, PLDI.

[17]  Zhendong Su,et al.  Compiler validation via equivalence modulo inputs , 2014, PLDI.

[18]  Mary Jean Harrold,et al.  Empirical evaluation of the tarantula automatic fault-localization technique , 2005, ASE.

[19]  Gregg Rothermel,et al.  Prioritizing test cases for regression testing , 2000, ISSTA '00.

[20]  Abhik Roychoudhury,et al.  Coverage-Based Greybox Fuzzing as Markov Chain , 2017, IEEE Trans. Software Eng..

[21]  Siau-Cheng Khoo,et al.  Efficient predicated bug signature mining via hierarchical instrumentation , 2014, ISSTA 2014.

[22]  Mark Harman,et al.  Regression testing minimization, selection and prioritization: a survey , 2012, Softw. Test. Verification Reliab..

[23]  A. Jefferson Offutt,et al.  Introduction to Software Testing , 2008 .

[24]  Zhendong Su,et al.  Skeletal program enumeration for rigorous compiler testing , 2016, PLDI.

[25]  Lu Fang,et al.  Low-overhead and fully automated statistical debugging with abstraction refinement , 2016, OOPSLA.

[26]  Glenford J. Myers,et al.  Art of Software Testing , 1979 .

[27]  Gordon Fraser,et al.  Does automated white-box test generation really help software testers? , 2013, ISSTA.

[28]  Mark Harman,et al.  Fault localization prioritization: Comparing information-theoretic and coverage-based approaches , 2013, TSEM.

[29]  Alex Groce,et al.  Randomized Differential Testing as a Prelude to Formal Verification , 2007, 29th International Conference on Software Engineering (ICSE'07).

[30]  Armando Solar-Lezama,et al.  Towards optimization-safe systems: analyzing the impact of undefined behavior , 2013, SOSP.

[31]  Gordon Fraser,et al.  Augmented dynamic symbolic execution , 2012, 2012 Proceedings of the 27th IEEE/ACM International Conference on Automated Software Engineering.

[32]  Sergio Segura,et al.  A Survey on Metamorphic Testing , 2016, IEEE Transactions on Software Engineering.

[33]  Eric Eide,et al.  Volatiles are miscompiled, and what to do about it , 2008, EMSOFT '08.

[34]  Tao Wang,et al.  Automated path generation for software fault localization , 2005, ASE '05.

[35]  Zhendong Su,et al.  Finding deep compiler bugs via guided stochastic program mutation , 2015, OOPSLA.

[36]  Zhendong Su,et al.  Steering symbolic execution to less traveled paths , 2013, OOPSLA.

[37]  Dr. Hui Xiong Association Analysis: Basic Concepts and Algorithms , 2005 .

[38]  Sarfraz Khurshid,et al.  Regression mutation testing , 2012, ISSTA 2012.

[39]  Zhendong Su,et al.  Coverage-directed differential testing of JVM implementations , 2016, PLDI.

[40]  Xuejun Yang,et al.  Finding and understanding bugs in C compilers , 2011, PLDI '11.

[41]  Junfeng Yang,et al.  DeepXplore , 2019, Commun. ACM.

[42]  Alastair F. Donaldson,et al.  Many-core compiler fuzzing , 2015, PLDI.

[43]  Raúl A. Santelices,et al.  Lightweight fault-localization using multiple coverage types , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[44]  GORDON FRASER,et al.  A Large-Scale Evaluation of Automated Unit Test Generation Using EvoSuite , 2014, ACM Trans. Softw. Eng. Methodol..

[45]  Frank Tip,et al.  Directed test generation for effective fault localization , 2010, ISSTA '10.

[46]  Alex Groce,et al.  Guidelines for Coverage-Based Comparisons of Non-Adequate Test Suites , 2015, ACM Trans. Softw. Eng. Methodol..

[47]  Zhendong Su,et al.  Finding and Analyzing Compiler Warning Defects , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).