Is mutation an appropriate tool for testing experiments?

The empirical assessment of test techniques plays an important role in software testing research. One common practice is to instrument faults, either manually or by using mutation operators. The latter allows the systematic, repeatable seeding of large numbers of faults; however, we do not know whether empirical results obtained this way lead to valid, representative conclusions. This paper investigates this important question based on a number of programs with comprehensive pools of test cases and known faults. It is concluded that, based on the data available thus far, the use of mutation operators is yielding trustworthy results (generated mutants are similar to real faults). Mutants appear however to be different from hand-seeded faults that seem to be harder to detect than real faults.

[1]  Gregg Rothermel,et al.  Can fault‐exposure‐potential estimates improve the fault detection abilities of test suites? , 2002, Softw. Test. Verification Reliab..

[2]  Phyllis G. Frankl,et al.  An experimental comparison of the effectiveness of the all-uses and all-edges adequacy criteria , 1991, TAV4.

[3]  Gregg Rothermel,et al.  An empirical study of regression test selection techniques , 2001, ACM Trans. Softw. Eng. Methodol..

[4]  A. Jefferson Offutt,et al.  Mutation 2000: uniting the orthogonal , 2001 .

[5]  Lionel C. Briand,et al.  Using simulation to empirically investigate test coverage criteria based on statechart , 2004, Proceedings. 26th International Conference on Software Engineering.

[6]  Gregg Rothermel,et al.  Empirical Studies of a Safe Regression Test Selection Technique , 1998, IEEE Trans. Software Eng..

[7]  James H. Andrews,et al.  General Test Result Checking with Log File Analysis , 2003, IEEE Trans. Software Eng..

[8]  Richard G. Hamlet,et al.  Testing Programs with the Aid of a Compiler , 1977, IEEE Transactions on Software Engineering.

[9]  A. Jefferson Offutt,et al.  Detecting equivalent mutants and the feasible path problem , 1996, Proceedings of 11th Annual Conference on Computer Assurance. COMPASS '96.

[10]  A. Jefferson Offutt,et al.  Investigations of the software testing coupling effect , 1992, TSEM.

[11]  Gregg Rothermel,et al.  An empirical study of regression test selection techniques , 1998, Proceedings of the 20th International Conference on Software Engineering.

[12]  Marc J. Balcer,et al.  The category-partition method for specifying and generating fuctional tests , 1988, CACM.

[13]  Thomas J. Ostrand,et al.  Experiments on the effectiveness of dataflow- and control-flow-based test adequacy criteria , 1994, Proceedings of 16th International Conference on Software Engineering.

[14]  Gregg Rothermel,et al.  An experimental determination of sufficient mutant operators , 1996, TSEM.

[15]  Yves Crouzet,et al.  An experimental study on software structural testing: deterministic versus random input generation , 1991, [1991] Digest of Papers. Fault-Tolerant Computing: The Twenty-First International Symposium.

[16]  John A. Clark,et al.  Investigating the effectiveness of object‐oriented testing strategies using the mutation method , 2001 .

[17]  Dana Angluin,et al.  Two notions of correctness and their relation to testing , 1982, Acta Informatica.

[18]  Michael D. Ernst,et al.  Improving test suites via operational abstraction , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[19]  Atif M. Memon,et al.  What test oracle should I use for effective GUI testing? , 2003, 18th IEEE International Conference on Automated Software Engineering, 2003. Proceedings..

[20]  Gregg Rothermel,et al.  Infrastructure support for controlled experimentation with software testing and regression testing techniques , 2004, Proceedings. 2004 International Symposium on Empirical Software Engineering, 2004. ISESE '04..

[21]  A. Jefferson Offutt,et al.  Automatically detecting equivalent mutants and infeasible paths , 1997 .

[22]  Richard J. Lipton,et al.  Hints on Test Data Selection: Help for the Practicing Programmer , 1978, Computer.

[23]  Phyllis G. Frankl,et al.  Further empirical studies of test effectiveness , 1998, SIGSOFT '98/FSE-6.

[24]  Phyllis G. Frankl,et al.  Empirical evaluation of the textual differencing regression testing technique , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).