On the number and nature of faults found by random testing

Intuition suggests that random testing should exhibit a considerable difference in the number of faults detected by two different runs of equal duration. As a consequence, random testing would be rather unpredictable. This article first evaluates the variance over time of the number of faults detected by randomly testing object‐oriented software that is equipped with contracts. It presents the results of an empirical study based on 1215 h of randomly testing 27 Eiffel classes, each with 30 seeds of the random number generator. The analysis of over 6 million failures triggered during the experiments shows that the relative number of faults detected by random testing over time is predictable, but that different runs of the random test case generator detect different faults. The experiment also suggests that the random testing quickly finds faults: the first failure is likely to be triggered within 30 s. The second part of this article evaluates the nature of the faults found by random testing. To this end, it first explains a fault classification scheme, which is also used to compare the faults found through random testing with those found through manual testing and with those found in field use of the software and recorded in user incident reports. The results of the comparisons show that each technique is good at uncovering different kinds of faults. None of the techniques subsumes any of the others; each brings distinct contributions. This supports a more general conclusion on comparisons between testing strategies: the number of detected faults is too coarse a criterion for such comparisons—the nature of faults must also be considered. Copyright © 2009 John Wiley & Sons, Ltd.

[1]  Eric Allen Bug Patterns in Java , 2002 .

[2]  Richard G. Hamlet,et al.  Partition Testing Does Not Inspire Confidence , 1990, IEEE Trans. Software Eng..

[3]  William G. Griswold,et al.  Dynamically discovering likely program invariants to support program evolution , 1999, Proceedings of the 1999 International Conference on Software Engineering (IEEE Cat. No.99CB37002).

[4]  Barton P. Miller,et al.  An empirical study of the robustness of Windows NT applications using random testing , 2000 .

[5]  Jim Gray,et al.  Why Do Computers Stop and What Can Be Done About It? , 1986, Symposium on Reliability in Distributed Software and Database Systems.

[6]  Marcelo d'Amorim,et al.  An Empirical Comparison of Automated Generation and Classification Techniques for Object-Oriented Unit Testing , 2006, 21st IEEE/ACM International Conference on Automated Software Engineering (ASE'06).

[7]  Sandro Morasca,et al.  On the analytical comparison of testing techniques , 2004, ISSTA '04.

[8]  K. Rustan M. Leino,et al.  The Spec# Programming System: An Overview , 2004, CASSIS.

[9]  James Miller,et al.  Comparing and combining software defect detection techniques: a replicated empirical study , 1997, ESEC '97/FSE-5.

[10]  Phyllis G. Frankl,et al.  All-uses vs mutation testing: An experimental comparison of effectiveness , 1997, J. Syst. Softw..

[11]  Phyllis G. Frankl,et al.  An Experimental Comparison of the Effectiveness of Branch Testing and Data Flow Testing , 1993, IEEE Trans. Software Eng..

[12]  Bertrand Meyer,et al.  Experimental assessment of random testing for object-oriented software , 2007, ISSTA '07.

[13]  Walter J. Gutjahr,et al.  Partition Testing vs. Random Testing: The Influence of Uncertainty , 1999, IEEE Trans. Software Eng..

[14]  David Hovemeyer,et al.  Finding bugs is easy , 2004, SIGP.

[15]  Jean-Marc Jézéquel,et al.  Design by Contract to Improve Software Vigilance , 2006, IEEE Transactions on Software Engineering.

[16]  Erik Kamsties,et al.  An Empirical Evaluation of Three Defect-Detection Techniques , 1995, ESEC.

[17]  Mats Per Erik Heimdahl,et al.  Specification test coverage adequacy criteria = specification test generation inadequacy criteria , 2004, Eighth IEEE International Symposium on High Assurance Systems Engineering, 2004. Proceedings..

[18]  Michael D. Ernst,et al.  Eclat: Automatic Generation and Classification of Test Inputs , 2005, ECOOP.

[19]  Koen Claessen,et al.  QuickCheck: a lightweight tool for random testing of Haskell programs , 2000, ICFP.

[20]  Elaine J. Weyuker,et al.  Analyzing Partition Testing Strategies , 1991, IEEE Trans. Software Eng..

[21]  Hong Zhu,et al.  Software unit test coverage and adequacy , 1997, ACM Comput. Surv..

[22]  A. Jefferson Offutt,et al.  A semantic model of program faults , 1996, ISSTA '96.

[23]  Catherine Oriat,et al.  Jartege: A Tool for Random Generation of Unit Tests for Java Classes , 2004, QoSA/SOQUA.

[24]  Bruno Legeard,et al.  Generation of test sequences from formal specifications: GSM 11‐11 standard case study , 2004, Softw. Pract. Exp..

[25]  S. N. Weiss,et al.  All-Uses versus Mutation Testing : An ExperimentalComparison of E ectiveness , 1996 .

[26]  P. Mahadevan,et al.  An overview , 2007, Journal of Biosciences.

[27]  Yannis Smaragdakis,et al.  DSD-Crasher: A hybrid analysis tool for bug finding , 2006, TSEM.

[28]  Wilhelm Hasselbring,et al.  Research issues in software fault categorization , 2007, SOEN.

[29]  Patrice Chalin,et al.  Are Practitioners Writing Contracts? , 2006, RODIN Book.

[30]  Sigrid Eldh Software Testing Techniques , 2007 .

[31]  Barton P. Miller,et al.  An empirical study of the reliability of UNIX utilities , 1990, Commun. ACM.

[32]  Sarfraz Khurshid,et al.  Korat: automated testing based on Java predicates , 2002, ISSTA '02.

[33]  Bertrand Meyer Attached Types and Their Application to Three Open Problems of Object-Oriented Programming , 2005, ECOOP.

[34]  Simeon C. Ntafos,et al.  An Evaluation of Random Testing , 1984, IEEE Transactions on Software Engineering.

[35]  Simeon C. Ntafos,et al.  On random and partition testing , 1998, ISSTA.

[36]  Claes Wohlin,et al.  Assuring fault classification agreement - an empirical evaluation , 2004, Proceedings. 2004 International Symposium on Empirical Software Engineering, 2004. ISESE '04..

[37]  Thomas J. Ostrand,et al.  Experiments on the effectiveness of dataflow- and control-flow-based test adequacy criteria , 1994, Proceedings of 16th International Conference on Software Engineering.

[38]  Robyn R. Lutz Targeting safety-related errors during software requirements analysis , 1993, SIGSOFT '93.

[39]  Alexander Pretschner,et al.  One evaluation of model-based testing and its automation , 2005, ICSE.

[40]  Dick Hamlet When only random testing will do , 2006, RT '06.

[41]  Inderpal S. Bhandari,et al.  Orthogonal Defect Classification - A Concept for In-Process Measurements , 1992, IEEE Trans. Software Eng..

[42]  MillerJames,et al.  Comparing and combining software defect detection techniques , 1997 .

[43]  Michael D. Ernst,et al.  Feedback-Directed Random Test Generation , 2007, 29th International Conference on Software Engineering (ICSE'07).

[44]  Yong Lei,et al.  Tool support for randomized unit testing , 2006, RT '06.

[45]  Elaine J. Weyuker,et al.  An Applicable Family of Data Flow Testing Criteria , 1988, IEEE Trans. Software Eng..

[46]  A. Jefferson Offutt,et al.  Inter-class mutation operators for Java , 2002, 13th International Symposium on Software Reliability Engineering, 2002. Proceedings..

[47]  Donald E. Knuth,et al.  The errors of tex , 1989, Softw. Pract. Exp..

[48]  John J. Marciniak,et al.  Encyclopedia of Software Engineering , 1994, Encyclopedia of Software Engineering.

[49]  Bertrand Meyer,et al.  Finding Faults: Manual Testing vs. Random+ Testing vs. User Reports , 2008, 2008 19th International Symposium on Software Reliability Engineering (ISSRE).

[50]  Alexander Pretschner Model-based testing , 2005, ICSE '05.

[51]  Michael D. Ernst,et al.  Dynamically discovering likely program invariants , 2000 .

[52]  W. Gutjahr Partition Testing versus Random Testing: the Innuence of Uncertainty , 1999 .

[53]  Bruno Legeard,et al.  A taxonomy of model-based testing , 2006 .

[54]  Yannis Smaragdakis,et al.  JCrasher: an automatic robustness tester for Java , 2004, Softw. Pract. Exp..

[55]  Victor R. Basili,et al.  Comparing the Effectiveness of Software Testing Strategies , 1987, IEEE Transactions on Software Engineering.

[56]  Bertrand Meyer,et al.  On the Predictability of Random Tests for Object-Oriented Software , 2008, 2008 1st International Conference on Software Testing, Verification, and Validation.