Empirical Evaluation of Approaches to Testing Applications without Test Oracles

Software testing of applications in fields like scientific computing, simulation, machine learning, etc. is particularly challenging because many applications in these domains have no reliable “test oracle” to indicate whether the program’s output is correct when given arbitrary input. A common approach to testing such applications has been to use a “pseudo-oracle”, in which multiple independently-developed implementations of an algorithm process an input and the results are compared: if the results are not the same, then at least one of the implementations contains a defect. Other approaches include the use of program invariants, formal specification languages, trace and log file analysis, and metamorphic testing. In this paper, we present the results of two empirical studies in which we compare the effectiveness of some of these approaches, including metamorphic testing and runtime assertion checking. These results demonstrate that metamorphic testing is generally more effective at revealing defects in applications without test oracles in various application domains, including non-deterministic programs. We also analyze the results in terms of the software development process, and discuss suggestions for both practitioners and researchers who need to test software without the help of a test oracle.

[1]  David S. Rosenblum A Practical Approach to Programming With Assertions , 1995, IEEE Trans. Software Eng..

[2]  Michael Pidd,et al.  Computer Simulation in Management Science (3rd Edition) , 1998 .

[3]  David Coppit,et al.  On the Use of Specification-Based Assertions as Test Oracles , 2005, 29th Annual IEEE/NASA Software Engineering Workshop.

[4]  Elaine J. Weyuker,et al.  Pseudo-oracles for non-testable programs , 1981, ACM '81.

[5]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[6]  Antonia Bertolino,et al.  Software Testing Research: Achievements, Challenges, Dreams , 2007, Future of Software Engineering (FOSE '07).

[7]  Michael D. Ernst,et al.  Invariant inference for static checking: an empirical evaluation , 2002, SOEN.

[8]  Peng Wu,et al.  Iterative Metamorphic Testing , 2005, 29th Annual International Computer Software and Applications Conference (COMPSAC'05).

[9]  T. H. Tse,et al.  An empirical comparison between direct and indirect test result checking approaches , 2006, SOQUA '06.

[10]  Nancy G. Leveson,et al.  The Use of Self Checks and Voting in Software Error Detection: An Empirical Study , 1990, IEEE Trans. Software Eng..

[11]  Elaine J. Weyuker,et al.  On Testing Non-Testable Programs , 1982, Comput. J..

[12]  Yingjun Zhang,et al.  Broad-spectrum studies of log file analysis , 2000, Proceedings of the 2000 International Conference on Software Engineering. ICSE 2000 the New Millennium.

[13]  Sriram Sankar,et al.  Software Testing Using Algebraic Specification Based Test Oracles , 1993 .

[14]  David S. Rosenblum,et al.  A historical perspective on runtime assertion checking in software development , 2006, SOEN.

[15]  Yoonsik Cheon Abstraction in Assertion-Based Test Oracles , 2007 .

[16]  Glenford J. Myers,et al.  Art of Software Testing , 1979 .

[17]  J.H. Andrews,et al.  Is mutation an appropriate tool for testing experiments? [software testing] , 2005, Proceedings. 27th International Conference on Software Engineering, 2005. ICSE 2005..

[18]  Nancy G. Leveson,et al.  An experimental evaluation of the assumption of independence in multiversion programming , 1986, IEEE Transactions on Software Engineering.

[19]  Lori A. Clarke,et al.  Simulating patient flow through an Emergency Department using process-driven discrete event simulation , 2009, 2009 ICSE Workshop on Software Engineering in Health Care.

[20]  Michael D. Ernst,et al.  Automatic generation of program specifications , 2002, ISSTA '02.

[21]  Johannes Mayer,et al.  Statistical Metamorphic Testing Testing Programs with Random Output by Means of Statistical Hypothesis Tests and Metamorphic Testing , 2007 .

[22]  Bertrand Meyer,et al.  A comparative study of programmer-written and automatically inferred contracts , 2009, ISSTA.

[23]  Christian Murphy,et al.  Parameterizing random test data according to equivalence classes , 2007, RT '07.

[24]  Hermann Kopetz,et al.  Fault tolerance, principles and practice , 1990 .

[25]  Gail E. Kaiser,et al.  Properties of Machine Learning Applications for Use in Metamorphic Testing , 2008, SEKE.

[26]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[27]  Liming Chen,et al.  N-VERSION PROGRAMMINC: A FAULT-TOLERANCE APPROACH TO RELlABlLlTY OF SOFTWARE OPERATlON , 1995, Twenty-Fifth International Symposium on Fault-Tolerant Computing, 1995, ' Highlights from Twenty-Five Years'..

[28]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[29]  C. Spearman ‘FOOTRULE’ FOR MEASURING CORRELATION , 1906 .

[30]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[31]  Philip M. Long,et al.  Predicting Electricity Distribution Feeder Failures Using Machine Learning Susceptibility Analysis , 2006, AAAI.

[32]  Tsong Yueh Chen,et al.  Automated functional testing of web search engines in the absence of an oracle , 2007 .

[33]  Student,et al.  THE PROBABLE ERROR OF A MEAN , 1908 .

[34]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .