Strength of evidence in systematic reviews in software engineering

Systematic reviews are only as good as the evidence they are based on. It is important, therefore, that users of systematic reviews know how much confidence they can place in the conclusions and recommendations arising from such reviews. In this paper we present an overview of some of the most influential systems for assessing the quality of individual primary studies and for grading the overall strength of a body of evidence. We also present an example of the use of such systems based on a systematic review of empirical studies of agile software development. Our findings suggest that the systems used in other disciplines for grading the strength of evidence for and reporting of systematic reviews, especially those that take account of qualitative and observational studies are of particular relevance for software engineering.

[1]  Dietmar Pfahl,et al.  Reporting guidelines for controlled experiments in software engineering , 2005, 2005 International Symposium on Empirical Software Engineering, 2005..

[2]  I. Olkin,et al.  Meta-analysis of observational studies in epidemiology - A proposal for reporting , 2000 .

[3]  Tore Dybå,et al.  A systematic review of effect size in software engineering experiments , 2007, Inf. Softw. Technol..

[4]  B. Brehmer In one word: Not from experience. , 1980 .

[5]  Deborah J. Cook,et al.  Systematic Reviews: Synthesis of Best Evidence for Health Care Decisions , 1998, Annals of Internal Medicine.

[6]  Tore Dybå,et al.  A systematic review of statistical power in software engineering experiments , 2006, Inf. Softw. Technol..

[7]  Tore Dybå,et al.  Evidence-based software engineering , 2004, Proceedings. 26th International Conference on Software Engineering.

[8]  M. Pittler Systematic Reviews in Health Care: Meta‐analysis in Context , 2010 .

[9]  Magne Jørgensen,et al.  A Systematic Review of Software Development Cost Estimation Studies , 2007, IEEE Transactions on Software Engineering.

[10]  Shari Lawrence Pfleeger,et al.  Preliminary Guidelines for Empirical Research in Software Engineering , 2002, IEEE Trans. Software Eng..

[11]  Amela Karahasanovic,et al.  A survey of controlled experiments in software engineering , 2005, IEEE Transactions on Software Engineering.

[12]  D. Moher,et al.  The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomised trials , 2001, The Lancet.

[13]  Pearl Brereton,et al.  Lessons from applying the systematic literature review process within the software engineering domain , 2007, J. Syst. Softw..

[14]  David R. Jones,et al.  Synthesising qualitative and quantitative evidence: A review of possible methods , 2005 .

[15]  Tore Dybå,et al.  Evidence-Based Software Engineering for Practitioners , 2005, IEEE Softw..

[16]  A R Jadad,et al.  Assessing the quality of reports of randomized clinical trials: is blinding necessary? , 1996, Controlled clinical trials.

[17]  G. Noblit,et al.  Meta-Ethnography: Synthesizing Qualitative Studies , 1988 .

[18]  Barbara A. Kitchenham,et al.  Empirical Paradigm - The Role of Experiments , 2006, Empirical Software Engineering Issues.

[19]  Liming Zhu,et al.  Evaluating guidelines for reporting empirical software engineering studies , 2008, Empirical Software Engineering.

[20]  I. Olkin,et al.  Improving the quality of reports of meta-analyses of randomised controlled trials: the QUOROM statement , 1999, The Lancet.

[21]  G. Guyatt,et al.  Grading quality of evidence and strength of recommendations , 2004, BMJ : British Medical Journal.

[22]  Douglas G. Altman,et al.  Systematic Reviews in Health Care , 2001 .

[23]  Oliver Laitenberger,et al.  (Quasi-)Experimental Studies in Industrial Settings , 2003, Lecture Notes on Empirical Software Engineering.

[24]  Tore Dybå,et al.  A systematic review of quasi-experiments in software engineering , 2009, Inf. Softw. Technol..

[25]  O. Dieste,et al.  Developing Search Strategies for Detecting Relevant Experiments for Systematic Reviews , 2007, First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007).

[26]  Tore Dybå,et al.  Evaluating Pair Programming with Respect to System Complexity and Programmer Expertise , 2007, IEEE Transactions on Software Engineering.

[27]  Pete McBreen,et al.  Questioning Extreme Programming , 2002 .

[28]  Mark Staples,et al.  Experiences using systematic review guidelines , 2006, J. Syst. Softw..

[29]  Janice Singer,et al.  For the Special issue on Qualitative Software Engineering Research , 2007, Inf. Softw. Technol..

[30]  A MacGregor How to Read a Paper: The Basics of Evidence Based Medicine , 2000 .

[31]  T. Cook,et al.  Quasi-experimentation: Design & analysis issues for field settings , 1979 .

[32]  DingsøyrTorgeir,et al.  Empirical studies of agile software development , 2008 .

[33]  Tore Dybå,et al.  A Systematic Review of Theory Use in Software Engineering Experiments , 2007, IEEE Transactions on Software Engineering.

[34]  T. Greenhalgh How to Read a Paper: The Basics of Evidence-Based Medicine , 1997 .

[35]  D Moher,et al.  The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. , 2001, Annals of internal medicine.

[36]  Tore Dybå,et al.  Applying Systematic Reviews to Diverse Study Types: An Experience Report , 2007, First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007).

[37]  W. Shadish,et al.  Experimental and Quasi-Experimental Designs for Generalized Causal Inference , 2001 .

[38]  Cheng Zhang,et al.  Search Engine Overlaps : Do they agree or disagree? , 2007, Second International Workshop on Realising Evidence-Based Software Engineering (REBSE '07).

[39]  Michele Tarsilla Cochrane Handbook for Systematic Reviews of Interventions , 2010, Journal of MultiDisciplinary Evaluation.

[40]  R. Hanka The Handbook of Research Synthesis , 1994 .

[41]  Natalia Juristo Juzgado,et al.  Effectiveness of Requirements Elicitation Techniques: Empirical Results Derived from a Systematic Review , 2006, 14th IEEE International Requirements Engineering Conference (RE'06).

[42]  Michael D. Myers,et al.  A Set of Principles for Conducting and Evaluating Interpretive Field Studies in Information Systems , 1999, MIS Q..

[43]  Tore Dybå,et al.  The Future of Empirical Methods in Software Engineering Research , 2007, Future of Software Engineering (FOSE '07).

[44]  Allen S. Lee A Scientific Methodology for MIS Case Studies , 1989, MIS Q..

[45]  Paula Gomes Mian,et al.  Systematic Review in Software Engineering , 2005 .

[46]  Tore Dybå,et al.  Empirical studies of agile software development: A systematic review , 2008, Inf. Softw. Technol..

[47]  D. Moher,et al.  The CONSORT statement: revised recommendations for improving the quality of reports of parallel-group randomized trials. , 2001, Journal of the American Podiatric Medical Association.

[48]  Douglas G. Altman,et al.  Systematic Reviews in Health Care: Meta-Analysis in Context: Second Edition , 2008 .

[49]  Emil H Schemitsch,et al.  Association between industry funding and statistically significant pro-industry findings in medical and surgical randomized trials. , 2004, CMAJ : Canadian Medical Association journal = journal de l'Association medicale canadienne.

[50]  M. Petticrew,et al.  Systematic Reviews in the Social Sciences: A Practical Guide , 2005 .

[51]  David Moher,et al.  Meta-analysis of Observational Studies in Epidemiology , 2000 .

[52]  Per Runeson,et al.  Checklists for Software Engineering Case Study Research , 2007, First International Symposium on Empirical Software Engineering and Measurement (ESEM 2007).

[53]  Emilia Mendes,et al.  A systematic review of Web engineering research , 2005, 2005 International Symposium on Empirical Software Engineering, 2005..