The impact of test case summaries on bug fixing performance: an empirical investigation

Automated test generation tools have been widely investigated with the goal of reducing the cost of testing activities. However, generated tests have been shown not to help developers in detecting and finding more bugs even though they reach higher structural coverage compared to manual testing. The main reason is that generated tests are difficult to understand and maintain. Our paper proposes an approach, coined TestDescriber, which automatically generates test case summaries of the portion of code exercised by each individual test, thereby improving understandability. We argue that this approach can complement the current techniques around automated unit test generation or searchbased techniques designed to generate a possibly minimal set of test cases. In evaluating our approach we found that (1) developers find twice as many bugs, and (2) test case summaries significantly improve the comprehensibility of test cases, which is considered particularly useful by developers.

[1]  Martin P. Robillard,et al.  Recovering traceability links between an API and its learning resources , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[2]  Alan Borning,et al.  Lightweight structural summarization as an aid to software evolution , 1996 .

[3]  Yannis Smaragdakis,et al.  JCrasher: an automatic robustness tester for Java , 2004, Softw. Pract. Exp..

[4]  Per Runeson,et al.  Detection of Duplicate Defect Reports Using Natural Language Processing , 2007, 29th International Conference on Software Engineering (ICSE'07).

[5]  Andrea De Lucia,et al.  Labeling source code with information retrieval methods: an empirical study , 2013, Empirical Software Engineering.

[6]  Gordon Fraser,et al.  1600 faults in 100 projects: automatically finding faults while achieving high coverage with EvoSuite , 2015, Empirical Software Engineering.

[7]  Gordon Fraser,et al.  Modeling readability to improve unit tests , 2015, ESEC/SIGSOFT FSE.

[8]  Mario Linares Vásquez,et al.  On Automatically Generating Commit Messages via Summarization of Source Code Changes , 2014, 2014 IEEE 14th International Working Conference on Source Code Analysis and Manipulation.

[9]  Emily Hill,et al.  AMAP: automatically mining abbreviation expansions in programs to enhance software maintenance tools , 2008, MSR '08.

[10]  Martin Monperrus,et al.  Test case purification for improving fault localization , 2014, SIGSOFT FSE.

[11]  Georgios Gousios,et al.  When, how, and why developers (do not) test in their IDEs , 2015, ESEC/SIGSOFT FSE.

[12]  Karen Spärck Jones Automatic summarising: The state of the art , 2007, Inf. Process. Manag..

[13]  Gordon Fraser,et al.  Whole Test Suite Generation , 2013, IEEE Transactions on Software Engineering.

[14]  Gordon Fraser,et al.  Does Automated Unit Test Generation Really Help Software Testers? A Controlled Empirical Study , 2015, ACM Trans. Softw. Eng. Methodol..

[15]  Andy Zaidman,et al.  Test Code Quality and Its Relation to Issue Handling Performance , 2014, IEEE Transactions on Software Engineering.

[16]  Luciano Baresi,et al.  TestFul: automatic unit-test generation for Java classes , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[17]  Andrian Marcus,et al.  On the Use of Automated Text Summarization Techniques for Summarizing Source Code , 2010, 2010 17th Working Conference on Reverse Engineering.

[18]  Westley Weimer,et al.  Automatic documentation inference for exceptions , 2008, ISSTA '08.

[19]  Collin McMillan,et al.  Automatic documentation generation via source code summarization of method context , 2014, ICPC 2014.

[20]  Dawson R. Engler,et al.  EXE: automatically generating inputs of death , 2006, CCS '06.

[21]  Sheeva Afshan,et al.  Evolving Readable String Test Inputs Using a Natural Language Model to Reduce Human Oracle Cost , 2013, 2013 IEEE Sixth International Conference on Software Testing, Verification and Validation.

[22]  Richard Fisher,et al.  Students as surrogates for practicing accountants: Further evidence , 2012 .

[23]  Paolo Tonella,et al.  Evolutionary testing of classes , 2004, ISSTA '04.

[24]  Huaikou Miao,et al.  Automatic Generating Test Cases for Testing Web Applications , 2007, 2007 International Conference on Computational Intelligence and Security Workshops (CISW 2007).

[25]  Michael D. Ernst,et al.  Randoop: feedback-directed random testing for Java , 2007, OOPSLA '07.

[26]  Mariano Ceccato,et al.  Do Automatically Generated Test Cases Make Debugging Easier? An Experimental Assessment of Debugging Effectiveness and Efficiency , 2015, ACM Trans. Softw. Eng. Methodol..

[27]  Lori L. Pollock,et al.  Automatic generation of natural language summaries for Java classes , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[28]  Gordon Fraser,et al.  Parameter tuning or default values? An empirical investigation in search-based software engineering , 2013, Empirical Software Engineering.

[29]  A. Vargha,et al.  A Critique and Improvement of the CL Common Language Effect Size Statistics of McGraw and Wong , 2000 .

[30]  Andreas Zeller,et al.  Mutation-Driven Generation of Unit Tests and Oracles , 2010, IEEE Transactions on Software Engineering.

[31]  Gordon Fraser,et al.  Automated unit test generation during software development: a controlled experiment and think-aloud observations , 2015, ISSTA.

[32]  Arie van Deursen,et al.  Visualizing Testsuites to Aid in Software Understanding , 2006, 11th European Conference on Software Maintenance and Reengineering (CSMR'07).

[33]  Gerardo Canfora,et al.  CODES: mining source code descriptions from developers discussions , 2014, ICPC 2014.

[34]  Paolo Tonella,et al.  Reformulating Branch Coverage as a Many-Objective Optimization Problem , 2015, 2015 IEEE 8th International Conference on Software Testing, Verification and Validation (ICST).

[35]  Lori Pollock,et al.  Automatic generation of descriptive summary comments for methods in object-oriented programs , 2012 .

[36]  Lori L. Pollock,et al.  Generating Parameter Comments and Integrating with Method Summaries , 2011, 2011 IEEE 19th International Conference on Program Comprehension.

[37]  Emily Hill,et al.  Automatically capturing source code context of NL-queries for software maintenance and reuse , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[38]  Lionel C. Briand,et al.  Automatic generation of system test cases from use case specifications , 2015, ISSTA.

[39]  Stephen G. MacDonell Reliance on correlation data for complexity metric use and validation , 1991, SIGP.

[40]  Emily Hill,et al.  Towards automatically generating summary comments for Java methods , 2010, ASE.

[41]  Fred P. Brooks,et al.  The Mythical Man-Month , 1975, Reliable Software.

[42]  Manabu Kamimura,et al.  Towards generating human-oriented summaries of unit test cases , 2013, 2013 21st International Conference on Program Comprehension (ICPC).

[43]  Paolo Tonella,et al.  Analysis and testing of Web applications , 2001, Proceedings of the 23rd International Conference on Software Engineering. ICSE 2001.

[44]  Jinqiu Yang,et al.  AutoComment: Mining question and answer sites for automatic comment generation , 2013, 2013 28th IEEE/ACM International Conference on Automated Software Engineering (ASE).

[45]  Emily Hill,et al.  Task-Driven Software Summarization , 2013, 2013 IEEE International Conference on Software Maintenance.

[46]  Y. PRASANTH,et al.  Using UML for automatic test generation , 2017 .

[47]  WohlinClaes,et al.  Using Students as SubjectsA Comparative Study ofStudents and Professionals in Lead-Time Impact Assessment , 2000 .

[48]  Gerardo Canfora,et al.  Mining source code descriptions from developer communications , 2012, 2012 20th IEEE International Conference on Program Comprehension (ICPC).

[49]  Arie van Deursen,et al.  On the Interplay Between Software Testing and Evolution and its Effect on Program Comprehension , 2008, Software Evolution.

[50]  Lori L. Pollock,et al.  Automatically detecting and describing high level actions within methods , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[51]  Michael D. Ernst,et al.  Automated documentation inference to explain failed tests , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[52]  Andrea De Lucia,et al.  Using IR methods for labeling source code artifacts: Is it worthwhile? , 2012, 2012 20th IEEE International Conference on Program Comprehension (ICPC).

[53]  Mark Harman,et al.  A Theoretical and Empirical Study of Search-Based Testing: Local, Global, and Hybrid Search , 2010, IEEE Transactions on Software Engineering.

[54]  Gordon Fraser,et al.  Does automated white-box test generation really help software testers? , 2013, ISSTA.

[55]  Bertrand Meyer,et al.  Automatic Testing of Object-Oriented Software , 2007, SOFSEM.