ARRESTT: A framework to create reproducible experiments to evaluate software testing techniques

Researchers have reported that software testing techniques (STT) in general lack empirical evidence, and yet empirical studies are still maturing in our field. Furthermore, validating existing experiments is often neglected by researchers in software engineering. Both executing and reproducing experiments are important to validate current scientific discoveries reported in literature. However, there is a lack of tools and frameworks to support these tasks. We propose a framework named ARRESTT that aids experimenters in creating and reproducing experiments. We validate ARRESTT through reproduction of a known experiment with test case selection techniques, and we are able to achieve results very similar to the original experiment. Based on an evaluation of reproducibility attributes, we conclude that ARRESTT enhances reproducibility of an experiment and does not demand a lot of effort to configure and execute an experiment.

[1]  Uirá Kulesza,et al.  Automated Support for Controlled Experiments in Software Engineering: A Systematic Review (S) , 2013, SEKE.

[2]  Lionel C. Briand,et al.  Empirical studies of software testing techniques: challenges, practical strategies, and future research , 2004, SOEN.

[3]  Claes Wohlin,et al.  Experimentation in Software Engineering , 2000, The Kluwer International Series in Software Engineering.

[4]  Patrícia Duarte de Lima Machado,et al.  On the use of a similarity function for test case selection in the context of model‐based testing , 2011, Softw. Test. Verification Reliab..

[5]  Omar S. Gómez,et al.  Replication , Reproduction and Re-analysis : Three ways for verifying experimental , 2010 .

[6]  Jelena Kovacevic,et al.  How to Encourage and Publish Reproducible Research , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[7]  Mark Harman,et al.  Regression testing minimization, selection and prioritization: a survey , 2012, Softw. Test. Verification Reliab..

[8]  Lionel C. Briand,et al.  A Hitchhiker's guide to statistical tests for assessing randomized algorithms in software engineering , 2014, Softw. Test. Verification Reliab..

[9]  Natalia Juristo Juzgado,et al.  Reviewing 25 Years of Testing Technique Experiments , 2004, Empirical Software Engineering.

[10]  Richard Torkar,et al.  An Initiative to Improve Reproducibility and Empirical Evaluation of Software Testing Techniques , 2015, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.

[11]  Patrícia Duarte de Lima Machado,et al.  Automated Test Case Selection Based on a Similarity Function , 2007, GI Jahrestagung.

[12]  Natalia Juristo Juzgado,et al.  Understanding replication of experiments in software engineering: A classification , 2014, Inf. Softw. Technol..

[13]  Lionel C. Briand,et al.  Achieving scalable model-based testing through test case diversity , 2013, TSEM.

[14]  A. Cann Replication , 2003, Principles of Molecular Virology.

[15]  Per Runeson,et al.  Empirical evaluations of regression test selection techniques: a systematic review , 2008, ESEM '08.

[16]  Gary James Jason,et al.  The Logic of Scientific Discovery , 1988 .

[17]  Jesús M. González-Barahona,et al.  On the reproducibility of empirical software engineering studies based on data retrieved from development repositories , 2011, Empirical Software Engineering.