Trends in the Quality of Human-Centric Software Engineering Experiments--A Quasi-Experiment

Context: Several text books and papers published between 2000 and 2002 have attempted to introduce experimental design and statistical methods to software engineers undertaking empirical studies. Objective: This paper investigates whether there has been an increase in the quality of human-centric experimental and quasi-experimental journal papers over the time period 1993 to 2010. Method: Seventy experimental and quasi-experimental papers published in four general software engineering journals in the years 1992-2002 and 2006-2010 were each assessed for quality by three empirical software engineering researchers using two quality assessment methods (a questionnaire-based method and a subjective overall assessment). Regression analysis was used to assess the relationship between paper quality and the year of publication, publication date group (before 2003 and after 2005), source journal, average coauthor experience, citation of statistical text books and papers, and paper length. The results were validated both by removing papers for which the quality score appeared unreliable and using an alternative quality measure. Results: Paper quality was significantly associated with year, citing general statistical texts, and paper length (p <; 0.05). Paper length did not reach significance when quality was measured using an overall subjective assessment. Conclusions: The quality of experimental and quasi-experimental software engineering papers appears to have improved gradually since 1993.

[1]  T. Cook,et al.  Quasi-experimentation: Design & analysis issues for field settings , 1979 .

[2]  Dietmar Pfahl,et al.  Reporting Experiments in Software Engineering , 2008, Guide to Advanced Empirical Software Engineering.

[3]  Claes Wohlin,et al.  Experimentation in software engineering: an introduction , 2000 .

[4]  Ralph L. Rosnow,et al.  People studying people , 1997 .

[5]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[6]  G. Lip How the Read a Paper: The Basics of Evidence Based Medicine , 1998, Journal of Human Hypertension.

[7]  Tore Dybå,et al.  A systematic review of statistical power in software engineering experiments , 2006, Inf. Softw. Technol..

[8]  Karl E. Peace,et al.  Intention to treat in clinical trials , 1989 .

[9]  A K Wagner,et al.  Segmented regression analysis of interrupted time series studies in medication use research , 2002, Journal of clinical pharmacy and therapeutics.

[10]  Paul A. Strooper,et al.  Maximising the information gained from a study of static analysis technologies for concurrent software , 2007, Empirical Software Engineering.

[11]  W. Shadish,et al.  Experimental and Quasi-Experimental Designs for Generalized Causal Inference , 2001 .

[12]  Natalia Juristo Juzgado,et al.  Basics of Software Engineering Experimentation , 2010, Springer US.

[13]  Hui Liu,et al.  Testing input validation in Web applications through automated model recovery , 2008, J. Syst. Softw..

[14]  Amela Karahasanovic,et al.  A survey of controlled experiments in software engineering , 2005, IEEE Transactions on Software Engineering.

[15]  M. Egger,et al.  The hazards of scoring the quality of clinical trials for meta-analysis. , 1999, JAMA.

[16]  A. Fink Conducting research literature reviews , 1998 .

[17]  Per Runeson,et al.  Can we evaluate the quality of software engineering experiments? , 2010, ESEM '10.

[18]  L. Delbeke Quasi-experimentation - design and analysis issues for field settings - cook,td, campbell,dt , 1980 .

[19]  Tore Dybå,et al.  A systematic review of effect size in software engineering experiments , 2007, Inf. Softw. Technol..

[20]  Shari Lawrence Pfleeger,et al.  Preliminary Guidelines for Empirical Research in Software Engineering , 2002, IEEE Trans. Software Eng..

[21]  Robert Rosenthal,et al.  People Studying People: Artifacts and Ethics in Behavioral Research , 1997 .

[22]  Per Runeson,et al.  Three empirical studies on the agreement of reviewers about the quality of software engineering experiments , 2012, Inf. Softw. Technol..

[23]  Vigdis Kampenes,et al.  Quality of Design, Analysis and Reporting of Software Engineering Experiments:A Systematic Review , 2007 .

[24]  Natalia Juristo Juzgado,et al.  Quantitative Determination of the Relationship between Internal Validity and Bias in Software Engineering Experiments: Consequences for Systematic Literature Reviews , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[25]  W. Rosenberger Dealing with multiplicities in pharmacoepidemiologic studies , 1996, Pharmacoepidemiology and drug safety.

[26]  Tore Dybå,et al.  A systematic review of quasi-experiments in software engineering , 2009, Inf. Softw. Technol..

[27]  O. Dieste,et al.  Developing Search Strategies for Detecting Relevant Experiments for Systematic Reviews , 2007, First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007).

[28]  Paulo Borba,et al.  An Estimation Model for Test Execution Effort , 2007, ESEM 2007.