论文信息 - Using Relative Lines of Code to Guide Automated Test Generation for Python

Using Relative Lines of Code to Guide Automated Test Generation for Python

Raw lines of code (LOC) is a metric that does not, at first glance, seem extremely useful for automated test generation. It is both highly language-dependent and not extremely meaningful, semantically, within a language: one coder can produce the same effect with many fewer lines than another. However, relative LOC, between components of the same project, turns out to be a highly useful metric for automated testing. In this article, we make use of a heuristic based on LOC counts for tested functions to dramatically improve the effectiveness of automated test generation. This approach is particularly valuable in languages where collecting code coverage data to guide testing has a very high overhead. We apply the heuristic to property-based Python testing using the TSTL (Template Scripting Testing Language) tool. In our experiments, the simple LOC heuristic can improve branch and statement coverage by large margins (often more than 20%, up to 40% or more) and improve fault detection by an even larger margin (usually more than 75% and up to 400% or more). The LOC heuristic is also easy to combine with other approaches and is comparable to, and possibly more effective than, two well-established approaches for guiding random testing.

[1] Elaine J. Weyuker,et al. Predicting the location and number of faults in large software systems , 2005, IEEE Transactions on Software Engineering.

[2] Qian Yang,et al. A survey of coverage based testing tools , 2006, AST '06.

[3] A. Jefferson Offutt,et al. Mutation 2000: uniting the orthogonal , 2001 .

[4] Stas Negara,et al. An empirical evaluation and comparison of manual and automated test selection , 2014, ASE.

[5] Sarfraz Khurshid,et al. Directed incremental symbolic execution , 2011, PLDI '11.

[6] Rudolf Ferenc,et al. Using the Conceptual Cohesion of Classes for Fault Prediction in Object-Oriented Systems , 2008, IEEE Transactions on Software Engineering.

[7] Alexander Serebrenik,et al. Empirical analysis of the relationship between CC and SLOC in a large corpus of Java methods and C functions , 2016, J. Softw. Evol. Process..

[8] Soumya Paul,et al. On the efficiency of automated testing , 2014, SIGSOFT FSE.

[9] ArcuriAndrea,et al. A Hitchhiker's guide to statistical tests for assessing randomized algorithms in software engineering , 2014 .

[10] José Javier Dolado,et al. A Validation of the Component-Based Method for Software Size Estimation , 2000, IEEE Trans. Software Eng..

[11] Robert Feldt,et al. Dynamic Regression Test Selection Based on a File Cache An Industrial Evaluation , 2009, 2009 International Conference on Software Testing Verification and Validation.

[12] Rudolf Ramler,et al. Towards Tool-Support for Test Case Selection in Manual Regression Testing , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation Workshops.

[13] Alex Groce,et al. Swarm Verification , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[14] Alex Groce,et al. Learning-Based Test Programming for Programmers , 2012, ISoLA.

[15] Alex Groce,et al. Code coverage for suite evaluation by developers , 2014, ICSE.

[16] Alex Groce,et al. Comparing non-adequate test suites using coverage criteria , 2013, ISSTA.

[17] Nachiappan Nagappan,et al. Predicting defects using network analysis on dependency graphs , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[18] Daniel Sundmark,et al. Towards Earlier Fault Detection by Value-Driven Prioritization of Test Cases Using Fuzzy TOPSIS , 2016, ITNG 2016.

[19] Fuyuki Ishikawa,et al. Feedback-controlled random test generation , 2015, ISSTA.

[20] Anne M. Denton,et al. A clustering approach to improving test case prioritization: An industrial case study , 2011, 2011 27th IEEE International Conference on Software Maintenance (ICSM).

[21] Dusica Marijan,et al. Multi-perspective Regression Test Prioritization for Time-Constrained Environments , 2015, 2015 IEEE International Conference on Software Quality, Reliability and Security.

[22] Alex Groce,et al. Tackling Large Verification Problems with the Swarm Tool , 2008, SPIN.

[23] Alex Groce,et al. A Little Language for Testing , 2015, NFM.

[24] RadjenovićDanijel,et al. Software fault prediction metrics , 2013 .

[25] Mats Per Erik Heimdahl,et al. Programs, tests, and oracles: the foundations of testing revisited , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[26] Alex Groce,et al. Comparing Automated Unit Testing Strategies , 2010 .

[27] Phil McMinn,et al. Search‐based software test data generation: a survey , 2004, Softw. Test. Verification Reliab..

[28] Mingzhe Wang,et al. EnFuzz: Ensemble Fuzzing with Seed Synchronization among Diverse Fuzzers , 2018, USENIX Security Symposium.

[29] Michael D. Ernst,et al. Feedback-Directed Random Test Generation , 2007, 29th International Conference on Software Engineering (ICSE'07).

[30] Gordon Fraser,et al. Do Automatically Generated Unit Tests Find Real Faults? An Empirical Study of Effectiveness and Challenges (T) , 2015, 2015 30th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[31] RobinsonBrian,et al. Extended firewall for regression testing: an experience report , 2008 .

[32] Fernando Brito e Abreu,et al. Evaluating the impact of object-oriented design on software quality , 1996, Proceedings of the 3rd International Software Metrics Symposium.

[33] M BiemanJames,et al. Cohesion and reuse in an object-oriented system , 1995 .

[34] Lionel C. Briand,et al. Replicated Case Studies for Investigating Quality Factors in Object-Oriented Designs , 2001, Empirical Software Engineering.

[35] Gregorio Robles,et al. Executable source code and non-executable source code: analysis and relationships , 2004 .

[36] Alex Groce,et al. Randomized Differential Testing as a Prelude to Formal Verification , 2007, 29th International Conference on Software Engineering (ICSE'07).

[37] Dawson R. Engler,et al. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs , 2008, OSDI.

[38] Per Runeson,et al. A case study of the class firewall regression test selection technique on a large scale distributed software system , 2005, 2005 International Symposium on Empirical Software Engineering, 2005..

[39] Koen Claessen,et al. QuickCheck: a lightweight tool for random testing of Haskell programs , 2000, ICFP.

[40] Kazuki Kaneoka,et al. Feedback-Based Random Test Generator for TSTL , 2017 .

[41] Lee J. White,et al. Extended firewall for regression testing: an experience report , 2008, J. Softw. Maintenance Res. Pract..

[42] Alex Groce,et al. A Method Dependence Relations Guided Genetic Algorithm , 2016, SSBSE.

[43] Michael D. Ernst,et al. Defects4J: a database of existing faults to enable controlled testing studies for Java programs , 2014, ISSTA 2014.

[44] Laurie A. Williams,et al. Pallino: automation to support regression test selection for cots-based applications , 2007, ASE.

[45] Susan A. Sherer,et al. Software fault prediction , 1995, J. Syst. Softw..

[46] SerebrenikAlexander,et al. Empirical analysis of the relationship between CC and SLOC in a large corpus of Java methods and C functions , 2016 .

[47] Arnaud Gotlieb,et al. Test Case Prioritization for Continuous Regression Testing: An Industrial Case Study , 2013, 2013 IEEE International Conference on Software Maintenance.

[48] RunesonPer,et al. A Replicated Quantitative Analysis of Fault Distributions in Complex Software Systems , 2007 .

[49] Lionel C. Briand,et al. Investigating quality factors in object-oriented designs: an industrial case study , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[50] Alex Groce,et al. Heuristics for model checking Java programs , 2004, International Journal on Software Tools for Technology Transfer.

[51] Koushik Sen,et al. DART: directed automated random testing , 2005, PLDI '05.

[52] Jeff T. Linderoth,et al. Optimizing customized program coverage , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[53] Per Runeson,et al. An Empirical Evaluation of Regression Testing Based on Fix-Cache Recommendations , 2010, 2010 Third International Conference on Software Testing, Verification and Validation.

[54] Alex Groce,et al. Can testedness be effectively measured? , 2016, SIGSOFT FSE.

[55] Anas N. Al-Rabadi,et al. A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[56] SrivastavaAmitabh,et al. Effectively prioritizing tests in development environment , 2002 .

[57] Gordon Fraser,et al. EvoSuite: automatic test suite generation for object-oriented software , 2011, ESEC/FSE '11.

[58] Alex Groce,et al. Generating focused random tests using directed swarm testing , 2016, ISSTA.

[59] Alex Groce,et al. Lightweight Automated Testing with Adaptation-Based Programming , 2012, 2012 IEEE 23rd International Symposium on Software Reliability Engineering.

[60] Konstantinos Sagonas,et al. A PropEr integration of types and function specifications with property-based testing , 2011, Erlang Workshop.

[61] Lionel C. Briand,et al. Formal analysis of the effectiveness and predictability of random testing , 2010, ISSTA '10.

[62] Gordon Fraser,et al. Random or evolutionary search for object‐oriented test suite generation? , 2018, Softw. Test. Verification Reliab..

[63] Hongyu Zhang,et al. An investigation of the relationships between lines of code and defects , 2009, 2009 IEEE International Conference on Software Maintenance.

[64] Cristian Cadar,et al. make test-zesti: A symbolic execution solution for improving regression testing , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[65] J. Zobel,et al. Mutation Testing for the New Century , 2001, The Springer International Series on Advances in Database Systems.

[66] Hiralal Agrawal,et al. Dominators, super blocks, and program coverage , 1994, POPL '94.

[67] Alex Groce,et al. Using test case reduction and prioritization to improve symbolic execution , 2014, ISSTA 2014.

[68] T. Menzies,et al. Metrics that matter , 2002, 27th Annual NASA Goddard/IEEE Software Engineering Workshop, 2002. Proceedings..

[69] Lionel C. Briand,et al. Is mutation an appropriate tool for testing experiments? , 2005, ICSE.

[70] Ahmed E. Hassan,et al. Replicating and Re-Evaluating the Theory of Relative Defect-Proneness , 2015, IEEE Transactions on Software Engineering.

[71] Brendan Murphy,et al. The Art of Testing Less without Sacrificing Quality , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[72] Alex Groce,et al. Mutations: How Close are they to Real Faults? , 2014, 2014 IEEE 25th International Symposium on Software Reliability Engineering.

[73] Jeffrey K. Hollingsworth,et al. Efficient instrumentation for code coverage testing , 2002, ISSTA '02.

[74] Matthias Hirzel,et al. Graph-Walk-based Selective Regression Testing of Web Applications Created with Google Web Toolkit , 2016, Software Engineering.

[75] Lee J. White,et al. Industrial real-time regression testing and analysis using firewalls , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[76] Martin Fowler,et al. Domain-Specific Languages , 2010, The Addison-Wesley signature series.

[77] Premkumar T. Devanbu,et al. BugSwarm: Mining and Continuously Growing a Dataset of Reproducible Failures and Fixes , 2019, 2019 IEEE/ACM 41st International Conference on Software Engineering (ICSE).

[78] Alex Groce,et al. Swarm Verification Techniques , 2011, IEEE Transactions on Software Engineering.

[79] Gregory Gay. To Call, or Not to Call: Contrasting Direct and Indirect Branch Coverage in Test Generation , 2018, 2018 IEEE/ACM 11th International Workshop on Search-Based Software Testing (SBST).

[80] Tibor Gyimóthy,et al. Empirical validation of object-oriented metrics on open source software for fault prediction , 2005, IEEE Transactions on Software Engineering.

[81] Harvey P. Siy,et al. Predicting Fault Incidence Using Software Change History , 2000, IEEE Trans. Software Eng..

[82] Per Runeson,et al. A Replicated Quantitative Analysis of Fault Distributions in Complex Software Systems , 2007, IEEE Transactions on Software Engineering.

[83] Reid Holmes,et al. Coverage is not strongly correlated with test suite effectiveness , 2014, ICSE.

[84] Emelie Engström,et al. Efficient regression testing based on test history: An industrial evaluation , 2015, 2015 IEEE International Conference on Software Maintenance and Evolution (ICSME).

[85] Joanne Bechta Dugan,et al. Empirical Analysis of Software Fault Content and Fault Proneness Using Bayesian Methods , 2007, IEEE Transactions on Software Engineering.

[86] Andrew Ruef,et al. Evaluating Fuzz Testing , 2018, CCS.

[87] Richard Torkar,et al. Software fault prediction metrics: A systematic literature review , 2013, Inf. Softw. Technol..

[88] Sarfraz Khurshid,et al. An Information Retrieval Approach for Regression Test Prioritization Based on Program Changes , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[89] Alex Groce,et al. Finding common ground: choose, assert, and assume , 2012, WODA 2012.

[90] Hongfang Liu,et al. An Investigation into the Functional Form of the Size-Defect Relationship for Software Modules , 2009, IEEE Transactions on Software Engineering.

[91] Premkumar T. Devanbu,et al. On the "naturalness" of buggy code , 2015, ICSE.

[92] Victor R. Basili,et al. A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[93] Norman E. Fenton,et al. Quantitative Analysis of Faults and Failures in a Complex Software System , 2000, IEEE Trans. Software Eng..

[94] Lionel C. Briand,et al. A Hitchhiker's guide to statistical tests for assessing randomized algorithms in software engineering , 2014, Softw. Test. Verification Reliab..

[95] Alex Groce,et al. Swarm testing , 2012, ISSTA 2012.

[96] Fernando Brito e Abreu,et al. Object-Oriented Software Engineering: Measuring and Controlling the Development Process , 1994 .

[97] Tim Menzies,et al. Nighthawk: a two-level genetic-random unit test data generator , 2007, ASE.

[98] Yuming Zhou,et al. Empirical Analysis of Object-Oriented Design Metrics for Predicting High and Low Severity Faults , 2006, IEEE Transactions on Software Engineering.

[99] Sebastian G. Elbaum,et al. Reducing coverage collection overhead with disposable instrumentation , 2004, 15th International Symposium on Software Reliability Engineering.

[100] Matthew B. Dwyer,et al. Controlling factors in evaluating path-sensitive error detection techniques , 2006, SIGSOFT '06/FSE-14.

[101] James M. Bieman,et al. Cohesion and reuse in an object-oriented system , 1995, SSR '95.

[102] Amitabh Srivastava,et al. Effectively prioritizing tests in development environment , 2002, ISSTA '02.

[103] Alex Groce,et al. TSTL: the template scripting testing language , 2018, International Journal on Software Tools for Technology Transfer.

[104] Herbert Bos,et al. VUzzer: Application-aware Evolutionary Fuzzing , 2017, NDSS.

[105] Gordon Fraser,et al. Random or Genetic Algorithm Search for Object-Oriented Test Suite Generation? , 2015, GECCO.

[106] Khaled El Emam,et al. The Confounding Effect of Class Size on the Validity of Object-Oriented Metrics , 2001, IEEE Trans. Software Eng..

[107] Lionel C. Briand,et al. Empirical Studies of Quality Models in Object-Oriented Systems , 2002, Adv. Comput..

[108] W. M. McKeeman,et al. Differential Testing for Software , 1998, Digit. Tech. J..

[109] McMinnPhil. Search-based software test data generation: a survey , 2004 .

[110] Mark Harman,et al. Evolving transformation sequences using genetic algorithms , 2004 .

[111] Carl G. Davis,et al. A Hierarchical Model for Object-Oriented Design Quality Assessment , 2002, IEEE Trans. Software Eng..

[112] Laurie A. Williams,et al. Applying regression test selection for COTS-based applications , 2006, ICSE.

[113] Letha H. Etzkorn,et al. Empirical Validation of Three Software Metrics Suites to Predict Fault-Proneness of Object-Oriented Classes Developed Using Highly Iterative or Agile Software Development Processes , 2007, IEEE Transactions on Software Engineering.