Understanding user understanding: determining correctness of generated program invariants

Recently, work has begun on automating the generation of test oracles, which are necessary to fully automate the testing process. One approach to such automation involves dynamic invariant generation which extracts invariants from program executions. To use such invariants as test oracles, however, it is necessary to distinguish correct from incorrect invariants, a process that currently requires human intervention. In this work we examine this process. In particular, we examine the ability of 30 users, across two empirical studies, to classify invariants generated from three Java programs. Our results indicate that users struggle to classify generated invariants: on average, they misclassify 9.1% to 31.7% of correct invariants and 26.1%-58.6% of incorrect invariants. These results contradict prior studies that suggest that classification by users is easy, and indicate that further work needs to be done to bridge the gap between the effectiveness of dynamic invariant generation in theory, and the ability of users to apply it in practice. Along these lines, we suggest several areas for future work.

[1]  Michael D. Ernst,et al.  Randoop: feedback-directed random testing for Java , 2007, OOPSLA '07.

[2]  Alex Groce,et al.  Understanding Counterexamples with explain , 2004, CAV.

[3]  Michael D. Ernst,et al.  Efficient incremental algorithms for dynamic detection of likely invariants , 2004, SIGSOFT '04/FSE-12.

[4]  David Notkin,et al.  Tool-assisted unit test selection based on operational violations , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[5]  Koushik Sen,et al.  CUTE: a concolic unit testing engine for C , 2005, ESEC/FSE-13.

[6]  Gregory Gay,et al.  Automated oracle creation support, or: How I learned to stop worrying about fault propagation and love mutation testing , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[7]  Marat Boshernitsan,et al.  From daikon to agitator: lessons and challenges in building a commercial tool for developer testing , 2006, ISSTA '06.

[8]  Nikolai Tillmann,et al.  DySy: dynamic symbolic execution for invariant inference , 2008, ICSE.

[9]  Joseph Zubin,et al.  Fundamental statistics in psychology and education , 1943 .

[10]  Jon Edvardsson,et al.  A Survey on Automatic Test Data Generation , 2002 .

[11]  William G. Griswold,et al.  Dynamically discovering likely program invariants to support program evolution , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[12]  Michael D. Ernst,et al.  Automatic generation of program specifications , 2002, ISSTA '02.

[13]  Michael D. Ernst,et al.  An overview of JML tools and applications , 2003, Electron. Notes Theor. Comput. Sci..

[14]  William G. Griswold,et al.  Quickly detecting relevant program invariants , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[15]  Bertrand Meyer,et al.  Inferring better contracts , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[16]  Angelo Gargantini,et al.  Using model checking to generate tests from requirements specifications , 1999, ESEC/FSE-7.

[17]  Nikolai Tillmann,et al.  Discovering Likely Method Specifications , 2006, ICFEM.

[18]  William G. Griswold,et al.  Automated support for program refactoring using invariants , 2001, Proceedings IEEE International Conference on Software Maintenance. ICSM 2001.

[19]  T. Utlaut Nonparametric Statistics with Applications to Science and Engineering , 2008 .

[20]  Stephen McCamant,et al.  The Daikon system for dynamic detection of likely invariants , 2007, Sci. Comput. Program..

[21]  Mats Per Erik Heimdahl,et al.  Programs, tests, and oracles: the foundations of testing revisited , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[22]  M. Young Test Oracles , 2001 .

[23]  Yoonsik Cheon,et al.  A Runtime Assertion Checker for the Java Modeling Language (JML) , 2003, ICSE 2003.

[24]  Antonia Bertolino,et al.  Software Testing Research: Achievements, Challenges, Dreams , 2007, Future of Software Engineering (FOSE '07).

[25]  Bryan F. Jones,et al.  Automatic structural testing using genetic algorithms , 1996, Softw. Eng. J..

[26]  David D. McDonald,et al.  Programs , 1984, CL.

[27]  David R. Cok,et al.  ESC/Java2: Uniting ESC/Java and JML , 2004, CASSIS.

[28]  Andreas Zeller,et al.  Mutation-Driven Generation of Unit Tests and Oracles , 2012, IEEE Trans. Software Eng..

[29]  David Notkin,et al.  Tool-assisted unit-test generation and selection based on operational abstractions , 2006, Automated Software Engineering.

[30]  Koushik Sen DART: Directed Automated Random Testing , 2009, Haifa Verification Conference.

[31]  Bertrand Meyer,et al.  A comparative study of programmer-written and automatically inferred contracts , 2009, ISSTA.