Evaluating guidelines for reporting empirical software engineering studies

BackgroundSeveral researchers have criticized the standards of performing and reporting empirical studies in software engineering. In order to address this problem, Jedlitschka and Pfahl have produced reporting guidelines for controlled experiments in software engineering. They pointed out that their guidelines needed evaluation. We agree that guidelines need to be evaluated before they can be widely adopted.AimThe aim of this paper is to present the method we used to evaluate the guidelines and report the results of our evaluation exercise. We suggest our evaluation process may be of more general use if reporting guidelines for other types of empirical study are developed.MethodWe used a reading method inspired by perspective-based and checklist-based reviews to perform a theoretical evaluation of the guidelines. The perspectives used were: Researcher, Practitioner/Consultant, Meta-analyst, Replicator, Reviewer and Author. Apart from the Author perspective, the reviews were based on a set of questions derived by brainstorming. A separate review was performed for each perspective. The review using the Author perspective considered each section of the guidelines sequentially.ResultsThe reviews detected 44 issues where the guidelines would benefit from amendment or clarification and 8 defects.ConclusionsReporting guidelines need to specify what information goes into what section and avoid excessive duplication. The current guidelines need to be revised and then subjected to further theoretical and empirical validation. Perspective-based checklists are a useful validation method but the practitioner/consultant perspective presents difficulties.Categories and Subject DescriptorsK.6.3 [Software Engineering]: Software Management—Software process.General TermsManagement, Experimentation.

[1]  Oscar Pastor,et al.  Assessing the reproducibility and accuracy of functional size measurement methods through experimentation , 2004, Proceedings. 2004 International Symposium on Empirical Software Engineering, 2004. ISESE '04..

[2]  S. J. Meggitt,et al.  Parallel-group, randomized controlled trial of azathioprine in moderate to severe atopic eczema, using a thiopurine methyltransferase-based dose regimen , 2003 .

[3]  Amela Karahasanovic,et al.  A survey of controlled experiments in software engineering , 2005, IEEE Transactions on Software Engineering.

[4]  Claes Wohlin,et al.  Experimentation in software engineering: an introduction , 2000 .

[5]  Barbara Kitchenham,et al.  Procedures for Performing Systematic Reviews , 2004 .

[6]  Liming Zhu,et al.  Evaluating guidelines for empirical software engineering studies , 2006, ISESE '06.

[7]  Shari Lawrence Pfleeger,et al.  Preliminary Guidelines for Empirical Research in Software Engineering , 2002, IEEE Trans. Software Eng..

[8]  James Hartley,et al.  Current findings from research on structured abstracts. , 2004, Journal of the Medical Library Association : JMLA.

[9]  I. Vernersson Open University Press , 2000 .

[10]  Jan Verelst The Influence of the Level of Abstraction on the Evolvability of Conceptual Models of Information Systems , 2005, Empirical Software Engineering.

[11]  Patrick J. Schroeder,et al.  Comparing the fault detection effectiveness of n-way and random test suites , 2004, Proceedings. 2004 International Symposium on Empirical Software Engineering, 2004. ISESE '04..

[12]  D. Moher,et al.  The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. , 2001, Journal of the American Podiatric Medical Association.

[13]  Giovanni Cantone,et al.  Comparing code reading techniques applied to object-oriented software frameworks with regard to effectiveness and defect detection rate , 2004 .

[14]  Dietmar Pfahl,et al.  Reporting guidelines for controlled experiments in software engineering , 2005, 2005 International Symposium on Empirical Software Engineering, 2005..

[15]  D. Moher,et al.  The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomised trials , 2001, The Lancet.

[16]  Peter R. Harris,et al.  Designing and reporting experiments in pyschology , 1986 .

[17]  Jan Verelst,et al.  The Influence of the Level of Abstraction on the Evolvability of Conceptual Models of Information Systems , 2004, Proceedings. 2004 International Symposium on Empirical Software Engineering, 2004. ISESE '04..

[18]  D Moher,et al.  The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. , 2001, Annals of internal medicine.

[19]  Forrest Shull,et al.  How perspective-based reading can improve requirements inspections , 2000, Computer.

[20]  Tore Dybå,et al.  A systematic review of statistical power in software engineering experiments , 2006, Inf. Softw. Technol..

[21]  Giovanni Cantone,et al.  Comparing code reading techniques applied to object-oriented software frameworks with regard to effectiveness and defect detection rate , 2004, Proceedings. 2004 International Symposium on Empirical Software Engineering, 2004. ISESE '04..

[22]  Barbara A. Kitchenham,et al.  Combining empirical results in software engineering , 1998, Inf. Softw. Technol..