Using differences among replications of software engineering experiments to gain knowledge

In no science or engineering discipline does it make sense to speak of isolated experiments. The results of a single experiment cannot be viewed as representative of the underlying reality. The concept of experiment is closely related to replication. Experiment replication is the repetition of an experiment to double-check its results. Multiple replications of an experiment increase the credibility of its results. Software engineering has tried its hand at the identical repetition of experiments in the way of the natural sciences (physics, chemistry, etc.). After numerous attempts over the years, excepting experiments repeated by the same researchers at the same site, no exact replications have yet been achieved. One key reason for this is the complexity of the software development setting. This complexity prevents the many experimental conditions from being reproduced identically. This paper reports research into whether non-exact replications can be of any use. We propose a process that allows researchers to generate new knowledge when running non-exact replications. To illustrate the advantages of the proposed process, two different replications of an experiment are shown.

[1]  Natalia Juristo Juzgado,et al.  Analysis of the influence of communication between researchers on experiment replication , 2006, ISESE '06.

[2]  Gregory V. Wilson,et al.  On the difficulty of replicating human subjects studies in software engineering , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[3]  Amela Karahasanovic,et al.  A survey of controlled experiments in software engineering , 2005, IEEE Transactions on Software Engineering.

[4]  Jeffrey C. Carver,et al.  A Pragmatic Documents Standard for an Experience Library: Roles,Documen, Contents and Structure , 2001 .

[5]  A. R. Ilersic,et al.  Research methods in social relations , 1961 .

[6]  Forrest Shull,et al.  Developing techniques for using software documents: a series of empirical studies , 1998 .

[7]  Erik Kamsties,et al.  An Empirical Evaluation of Three Defect-Detection Techniques , 1995, ESEC.

[8]  Adam A. Porter,et al.  Assessing Software Review Meetings: Results of a Comparative Analysis of Two Experimental Studies , 1997, IEEE Trans. Software Eng..

[9]  James Miller,et al.  Comparing and combining software defect detection techniques: a replicated empirical study , 1997, ESEC '97/FSE-5.

[10]  James Miller,et al.  Applying meta-analytical procedures to software engineering experiments , 2000, J. Syst. Softw..

[11]  Oliver Laitenberger,et al.  (Quasi-)Experimental Studies in Industrial Settings , 2003, Lecture Notes on Empirical Software Engineering.

[12]  Natalia Juristo Juzgado,et al.  Functional Testing, Structural Testing, and Code Reading: What Fault Type Do They Each Detect? , 2003, ESERNET.

[13]  Jeffrey C. Carver,et al.  The role of replications in Empirical Software Engineering , 2008, Empirical Software Engineering.

[14]  James Miller,et al.  An empirical evaluation of defect detection techniques , 1997, Inf. Softw. Technol..

[15]  Barbara A. Kitchenham,et al.  The role of replications in empirical software engineering—a word of warning , 2008, Empirical Software Engineering.

[16]  Natalia Juristo Juzgado,et al.  Reviewing 25 Years of Testing Technique Experiments , 2004, Empirical Software Engineering.

[17]  A. Ehrenberg,et al.  The Design of Replicated Studies , 1993 .

[18]  Jeffrey C. Carver,et al.  Replicated Studies: Building a Body of Knowledge about Software Reading Techniques , 2003, Lecture Notes on Empirical Software Engineering.

[19]  Victor R. Basili,et al.  Comparing the Effectiveness of Software Testing Strategies , 1987, IEEE Transactions on Software Engineering.