Do Pseudo Test Suites Lead to Inflated Correlation in Measuring Test Effectiveness?

Code coverage is the most widely adopted criteria for measuring test effectiveness in software quality assurance. The performance of coverage criteria (in indicating test suites' effectiveness) has been widely studied in prior work. Most of the studies use randomly constructed pseudo test suites to facilitate data collection for correlation analysis, yet no previous work has systematically studied whether pseudo test suites would lead to inflated correlation results. This paper focuses on the potentially wide-spread threat with a study over 123 real-world Java projects. Following the typical experimental process of studying coverage criteria, we investigate the correlation between statement/assertion coverage and mutation score using both pseudo and original test suites. Except for direct correlation analysis, we control the number of assertions and the test suite size to conduct partial correlation analysis. The results reveal that 1) the correlation (between coverage criteria and mutation score) derived from pseudo test suites is much higher than from original test suites (from 0.21 to 0.39 higher in Kendall value); 2) contrary to previously reported, statement coverage has a stronger correlation with mutation score than assertion coverage.

[1]  Michael Poppleton,et al.  Fast Model-Based Fault Localisation with Test Suites , 2015, TAP@STAF.

[2]  Michael R. Lyu,et al.  The effect of code coverage on fault detection under different testing profiles , 2005, A-MOST.

[3]  Jie Zhang,et al.  Automated refactoring of nested-IF formulae in spreadsheets , 2018, ESEC/SIGSOFT FSE.

[4]  Xia Li,et al.  Boosting spectrum-based fault localization using PageRank , 2017, ISSTA.

[5]  N. Lazar,et al.  The ASA Statement on p-Values: Context, Process, and Purpose , 2016 .

[6]  Yijun Yu,et al.  Locating bugs without looking back , 2016, 2016 IEEE/ACM 13th Working Conference on Mining Software Repositories (MSR).

[7]  Michael D. Ernst,et al.  Feedback-Directed Random Test Generation , 2007, 29th International Conference on Software Engineering (ICSE'07).

[8]  Zhenkai Liang,et al.  Test generation to expose changes in evolving programs , 2010, ASE '10.

[9]  Joseph Robert Horgan,et al.  Effect of test set size and block coverage on the fault detection effectiveness , 1994, Proceedings of 1994 IEEE International Symposium on Software Reliability Engineering.

[10]  Myra B. Cohen,et al.  Coverage and adequacy in software product line testing , 2006, ROSATEA '06.

[11]  Lu Zhang,et al.  Isomorphic regression testing: executing uncovered branches without test augmentation , 2016, SIGSOFT FSE.

[12]  Pei Wang,et al.  Partial Correlation Estimation by Joint Sparse Regression Models , 2008, Journal of the American Statistical Association.

[13]  Sarfraz Khurshid,et al.  An Information Retrieval Approach for Regression Test Prioritization Based on Program Changes , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[14]  Andreas Zeller,et al.  Efficient mutation testing by checking invariant violations , 2009, ISSTA.

[15]  Gordon Fraser,et al.  EvoSuite: automatic test suite generation for object-oriented software , 2011, ESEC/FSE '11.

[16]  Duncan Cramer,et al.  A Cautionary Tale of Two Statistics: Partial Correlation and Standardized Partial Regression , 2003, The Journal of psychology.

[17]  Gregg Rothermel,et al.  Prioritizing test cases for regression testing , 2000, ISSTA '00.

[18]  Akbar Siami Namin,et al.  The influence of size and coverage on test suite effectiveness , 2009, ISSTA.

[19]  Darko Marinov,et al.  Balancing trade-offs in test-suite reduction , 2014, SIGSOFT FSE.

[20]  Lu Zhang,et al.  An Empirical Study on the Scalability of Selective Mutation Testing , 2014, 2014 IEEE 25th International Symposium on Software Reliability Engineering.

[21]  Peter Fransson,et al.  The precuneus/posterior cingulate cortex plays a pivotal role in the default mode network: Evidence from a partial correlation network analysis , 2008, NeuroImage.

[22]  Lionel C. Briand,et al.  Using Mutation Analysis for Assessing and Comparing Testing Coverage Criteria , 2006, IEEE Transactions on Software Engineering.

[23]  Alex Groce,et al.  Code coverage for suite evaluation by developers , 2014, ICSE.

[24]  Alex Groce,et al.  Comparing non-adequate test suites using coverage criteria , 2013, ISSTA.

[25]  Lu Zhang,et al.  Predictive Mutation Testing , 2016, IEEE Transactions on Software Engineering.

[26]  Andreas Zeller,et al.  Assessing Oracle Quality with Checked Coverage , 2011, 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation.

[27]  Baowen Xu,et al.  An empirical study on constraint optimization techniques for test generation , 2015, Science China Information Sciences.

[28]  Lu Zhang,et al.  Search-based inference of polynomial metamorphic relations , 2014, ASE.

[29]  Yves Le Traon,et al.  Comparing White-Box and Black-Box Test Prioritization , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[30]  Koushik Sen,et al.  Symbolic execution for software testing: three decades later , 2013, CACM.

[31]  Phyllis G. Frankl,et al.  Further empirical studies of test effectiveness , 1998, SIGSOFT '98/FSE-6.

[32]  Lingming Zhang,et al.  Speeding up Mutation Testing via Regression Test Selection: An Extensive Study , 2018, 2018 IEEE 11th International Conference on Software Testing, Verification and Validation (ICST).

[33]  Alexander L. Wolf,et al.  Evaluating Test Suites and Adequacy Criteria Using Simulation-Based Models of Distributed Systems , 2008, IEEE Transactions on Software Engineering.

[34]  Yves Le Traon,et al.  Chapter Six - Mutation Testing Advances: An Analysis and Survey , 2019, Adv. Comput..

[35]  Martin Monperrus,et al.  Test case purification for improving fault localization , 2014, SIGSOFT FSE.

[36]  Andy P. Field,et al.  Discovering Statistics Using Ibm Spss Statistics , 2017 .

[37]  Tetsuo Tamai,et al.  Counterexample-Based Error Localization of Behavior Models , 2011, NASA Formal Methods.

[38]  Lionel C. Briand,et al.  Is mutation an appropriate tool for testing experiments? , 2005, ICSE.

[39]  Dietmar Pfahl,et al.  Using simulation for assessing the real impact of test-coverage on defect-coverage , 2000, IEEE Trans. Reliab..

[40]  David Lo,et al.  Code coverage and test suite effectiveness: Empirical study with real bugs in large systems , 2015, 2015 IEEE 22nd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[41]  Michael D. Ernst,et al.  Are mutants a valid substitute for real faults in software testing? , 2014, SIGSOFT FSE.

[42]  Reid Holmes,et al.  Coverage is not strongly correlated with test suite effectiveness , 2014, ICSE.

[43]  Phyllis G. Frankl,et al.  All-uses vs mutation testing: An experimental comparison of effectiveness , 1997, J. Syst. Softw..

[44]  Phyllis G. Frankl,et al.  An Experimental Comparison of the Effectiveness of Branch Testing and Data Flow Testing , 1993, IEEE Trans. Software Eng..

[45]  Gregg Rothermel,et al.  Prioritizing test cases for regression testing , 2000, ISSTA '00.

[46]  J.H. Andrews,et al.  Is mutation an appropriate tool for testing experiments? [software testing] , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[47]  Lu Zhang,et al.  How Does Regression Test Prioritization Perform in Real-World Software Evolution? , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[48]  Yucheng Zhang,et al.  Assertions are strongly correlated with test suite effectiveness , 2015, ESEC/SIGSOFT FSE.

[49]  Xia Li,et al.  Transforming programs and tests in tandem for fault localization , 2017, Proc. ACM Program. Lang..