Statement frequency coverage: A code coverage criterion for assessing test suite effectiveness

Abstract Context: Software testing is a pivotal activity in the development of high-quality software. As software is evolving through its life cycle, the need for a fault-revealing criterion assessing the effectiveness of the test suite grows. Over the years, researchers have proposed coverage-based criteria, including statement and branch coverage, to solve this issue. In literature, the effectiveness of such criteria is attested in terms of their correlations with the mutation score. Objective: In this paper, we aim at proposing a coverage-based criterion named statement frequency coverage, which outperforms statement and branch coverage in terms of correlation with mutation score. Method: To this end, we incorporated the frequency of executed statements into statement coverage and created a coverage-based criterion for assessing test suite effectiveness. Statement frequency coverage assigns a continuous value to a statement whose value is proportional to the number of times executed during test execution. We evaluated our approach on 22 real-world Python projects with more than 118 000 source lines of code (without blank lines, comments, and test cases) and 21 000 test cases through measuring the correlation between statement frequency coverage and corresponding mutation score. Results: The results show that statement frequency coverage outperforms statement and branch coverage criteria. The correlation between it and the corresponding mutation score is higher than the correlation of statement and branch coverage with their mutation score. The results also show that unlike statement and branch coverage, there is no statistical difference between statement frequency coverage and mutation score. Conclusion: Statement frequency coverage is a better choice compared to statement and branch coverage in assessing test suite effectiveness in the real-world setting. Furthermore, we demonstrate that although statement frequency coverage subsumes statement coverage, it is incomparable to branch coverage under the adequate test suite condition.

[1]  Alex Groce,et al.  Can testedness be effectively measured? , 2016, SIGSOFT FSE.

[2]  Elaine J. Weyuker,et al.  An Applicable Family of Data Flow Testing Criteria , 1988, IEEE Trans. Software Eng..

[3]  Andreas Zeller,et al.  Covering and Uncovering Equivalent Mutants , 2013, Softw. Test. Verification Reliab..

[4]  Frank Piessens,et al.  State Coverage: Software Validation Metrics beyond Code Coverage , 2012, SOFSEM.

[5]  Akbar Siami Namin,et al.  The influence of size and coverage on test suite effectiveness , 2009, ISSTA.

[6]  Lionel C. Briand,et al.  Using Mutation Analysis for Assessing and Comparing Testing Coverage Criteria , 2006, IEEE Transactions on Software Engineering.

[7]  Christoph Treude,et al.  AutoSpearman: Automatically Mitigating Correlated Software Metrics for Interpreting Defect Models , 2018, 2018 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[8]  Herbert L. Costner,et al.  Criteria for Measures of Association , 1965 .

[9]  Andreas Zeller,et al.  Assessing Oracle Quality with Checked Coverage , 2011, 2011 Fourth IEEE International Conference on Software Testing, Verification and Validation.

[10]  A. Jefferson Offutt,et al.  A mutation carol: Past, present and future , 2011, Inf. Softw. Technol..

[11]  Hadi Hemmati,et al.  How Effective Are Code Coverage Criteria? , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.

[12]  Steffen Herbold,et al.  Autorank: A Python package for automated ranking of classifiers , 2020, J. Open Source Softw..

[13]  Seyed-Hassan Mirian-Hosseinabadi,et al.  Program State Coverage: A Test Coverage Metric Based on Executed Program States , 2019, 2019 IEEE 26th International Conference on Software Analysis, Evolution and Reengineering (SANER).

[14]  Seyed-Hassan Mirian-Hosseinabadi,et al.  Incorporating fault-proneness estimations into coverage-based test case prioritization methods , 2019, Inf. Softw. Technol..

[15]  Miroslaw Staron,et al.  Mythical Unit Test Coverage , 2018, IEEE Software.

[16]  Mark Harman,et al.  An Analysis and Survey of the Development of Mutation Testing , 2011, IEEE Transactions on Software Engineering.

[17]  Yucheng Zhang,et al.  Assertions are strongly correlated with test suite effectiveness , 2015, ESEC/SIGSOFT FSE.

[18]  Yves Le Traon,et al.  Chapter Six - Mutation Testing Advances: An Analysis and Survey , 2019, Adv. Comput..

[19]  Fadi Wedyan,et al.  On generating mutants for AspectJ programs , 2012, Inf. Softw. Technol..

[20]  Shane McIntosh,et al.  An Empirical Comparison of Model Validation Techniques for Defect Prediction Models , 2017, IEEE Transactions on Software Engineering.

[21]  Frances E. Allen,et al.  Control-flow analysis , 2022 .

[22]  James H. Andrews,et al.  Comparing Multi-Point Stride Coverage and dataflow coverage , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[23]  M. Kendall The treatment of ties in ranking problems. , 1945, Biometrika.

[24]  Harald C. Gall,et al.  Lightweight Assessment of Test-Case Effectiveness Using Source-Code-Quality Indicators , 2019, IEEE Transactions on Software Engineering.

[25]  Ahmed E. Hassan,et al.  The Impact of Correlated Metrics on the Interpretation of Defect Models , 2019, IEEE Transactions on Software Engineering.

[26]  Shane McIntosh,et al.  Automated Parameter Optimization of Classification Techniques for Defect Prediction Models , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[27]  Alex Groce,et al.  Guidelines for Coverage-Based Comparisons of Non-Adequate Test Suites , 2015, ACM Trans. Softw. Eng. Methodol..

[28]  Ying Meng,et al.  Investigating faults missed by test suites achieving high code coverage , 2018, J. Syst. Softw..

[29]  Nicos Malevris,et al.  MEDIC: A static analysis framework for equivalent mutant identification , 2015, Inf. Softw. Technol..

[30]  Francisco Chicano,et al.  An Experimental and Practical Study on the Equivalent Mutant Connection: An Evolutionary Approach , 2020, 2022 IEEE Conference on Software Testing, Verification and Validation (ICST).

[31]  Stefan Wagner,et al.  Is the Stack Distance Between Test Case and Method Correlated With Test Effectiveness? , 2019, EASE.

[32]  Hong Zhu,et al.  Software unit test coverage and adequacy , 1997, ACM Comput. Surv..

[33]  Lech Madeyski,et al.  The impact of Test-First programming on branch coverage and mutation score indicator of unit tests: An experiment , 2010, Inf. Softw. Technol..

[34]  David Kao,et al.  State coverage: a structural test adequacy criterion for behavior checking , 2007, ESEC-FSE companion '07.

[35]  Huai Liu,et al.  A path-aware approach to mutant reduction in mutation testing , 2017, Inf. Softw. Technol..

[36]  Charles W. Butler,et al.  Design complexity measurement and testing , 1989, CACM.

[37]  Daniel Zwillinger,et al.  CRC Standard Probability and Statistics Tables and Formulae, Student Edition , 1999 .

[38]  Alex Groce,et al.  Code coverage for suite evaluation by developers , 2014, ICSE.

[39]  Fabiano Cutigi Ferrari,et al.  A systematic literature review of techniques and metrics to reduce the cost of mutation testing , 2019, J. Syst. Softw..

[40]  A. Jefferson Offutt,et al.  An experimental comparison of edge, edge-pair, and prime path criteria , 2018, Sci. Comput. Program..

[41]  Steven P. Miller,et al.  Applicability of modified condition/decision coverage to software testing , 1994, Softw. Eng. J..

[42]  Amanda Schwartz,et al.  The Impact of Fault Type on the Relationship between Code Coverage and Fault Detection , 2016, 2016 IEEE/ACM 11th International Workshop in Automation of Software Test (AST).

[43]  A. Jefferson Offutt,et al.  Introduction to Software Testing , 2008 .

[44]  A. Jefferson Offutt,et al.  An Experimental Comparison of Four Unit Test Criteria: Mutation, Edge-Pair, All-Uses and Prime Path Coverage , 2009, 2009 International Conference on Software Testing, Verification, and Validation Workshops.

[45]  Jeremy S. Bradbury,et al.  Predicting mutation score using source code and test suite metrics , 2012, 2012 First International Workshop on Realizing AI Synergies in Software Engineering (RAISE).

[46]  Yves Le Traon,et al.  An Empirical Study on Mutation, Statement and Branch Coverage Fault Revelation That Avoids the Unreliable Clean Program Assumption , 2017, 2017 IEEE/ACM 39th International Conference on Software Engineering (ICSE).

[47]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[48]  A. Jefferson Offutt,et al.  Detecting equivalent mutants and the feasible path problem , 1996, Proceedings of 11th Annual Conference on Computer Assurance. COMPASS '96.

[49]  Phyllis G. Frankl,et al.  An Experimental Comparison of the Effectiveness of Branch Testing and Data Flow Testing , 1993, IEEE Trans. Software Eng..

[50]  Andreas Zeller,et al.  Checked coverage: an indicator for oracle quality , 2013, Softw. Test. Verification Reliab..

[51]  Reid Holmes,et al.  Coverage is not strongly correlated with test suite effectiveness , 2014, ICSE.

[52]  Michael R. Lyu,et al.  The effect of code coverage on fault detection under different testing profiles , 2005, A-MOST.

[53]  Ronald L. Rivest,et al.  Introduction to Algorithms , 1990 .

[54]  David Lo,et al.  Code Coverage and Postrelease Defects: A Large-Scale Study on Open Source Projects , 2017, IEEE Transactions on Reliability.

[55]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[56]  Fabio Del Frate,et al.  On the correlation between code coverage and software reliability , 1995, Proceedings of Sixth International Symposium on Software Reliability Engineering. ISSRE'95.

[57]  Shin Yoo,et al.  Are Mutation Scores Correlated with Real Fault Detection? A Large Scale Empirical Study on the Relationship Between Mutants and Real Faults , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering (ICSE).

[58]  Lionel C. Briand,et al.  Is mutation an appropriate tool for testing experiments? , 2005, ICSE.

[59]  Mark Harman,et al.  The Oracle Problem in Software Testing: A Survey , 2015, IEEE Transactions on Software Engineering.

[60]  David Lo,et al.  An Empirical Study on the Adequacy of Testing in Open Source Projects , 2014, 2014 21st Asia-Pacific Software Engineering Conference.

[61]  Phyllis G. Frankl,et al.  Further empirical studies of test effectiveness , 1998, SIGSOFT '98/FSE-6.