Systematic Literature Reviews in Software Engineering - Enhancement of the Study Selection Process using Cohen's Kappa Statistic

Context: Systematic literature reviews (SLRs) rely on a rigorous and auditable methodology for minimizing biases and ensuring reliability. A common kind of bias arises when selecting studies using a set of inclusion/exclusion criteria. This bias can be decreased through dual revision, which makes the selection process more time-consuming and remains prone to generating bias depending on how each researcher interprets the inclusion/exclusion criteria. Objective: To reduce the bias and time spent in the study selection process, this paper presents a process for selecting studies based on the use of Cohen's Kappa statistic. We have defined an iterative process based on the use of this statistic during which the criteria are refined until obtain almost perfect agreement (k>0.8). At this point, the two researchers interpret the selection criteria in the same way, and thus, the bias is reduced. Starting from this agreement, dual review can be eliminated; consequently, the time spent is drastically shortened. Method: The feasibility of this iterative process for selecting studies is demonstrated through a tertiary study in the area of software engineering on works that were published from 2005 to 2018. Results: The time saved in the study selection process was 28% (for 152 studies) and if the number of studies is sufficiently large, the time saved tend asymptotically to 50%. Conclusions: Researchers and students may take advantage of this iterative process for selecting studies when conducting SLRs to reduce bias in the interpretation of inclusion and exclusion criteria. It is especially useful for research with few resources.

[1]  Pearl Brereton,et al.  Systematic literature reviews in software engineering - A systematic literature review , 2009, Inf. Softw. Technol..

[2]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[3]  P. Shekelle,et al.  Avoiding Bias in Selecting Studies , 2013 .

[4]  Pearl Brereton,et al.  Pair programming as a teaching tool: a student review of empirical studies , 2009, 2009 22nd Conference on Software Engineering Education and Training.

[5]  Jeffrey C. Carver,et al.  Identification and prioritization of SLR search tool requirements: an SLR and a survey , 2018, Empirical Software Engineering.

[6]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[7]  André L. M. Santos,et al.  Six years of systematic literature reviews in software engineering: An updated tertiary study , 2011, Inf. Softw. Technol..

[8]  Tore Dybå,et al.  Evidence-based software engineering , 2004, Proceedings. 26th International Conference on Software Engineering.

[9]  Bárbara Niegia Garcia de Goulart,et al.  Como minimizar vieses em revisões sistemáticas de estudos observacionais , 2017 .

[10]  Pearl Brereton,et al.  Systematic literature reviews in software engineering - A tertiary study , 2010, Inf. Softw. Technol..

[11]  Tore Dybå,et al.  Strength of evidence in systematic reviews in software engineering , 2008, ESEM '08.

[12]  Elena Gómez Arquitecturas software para microservicios: una revisión sistemática de la literatura , 2018 .

[13]  Pearl Brereton,et al.  Reporting systematic reviews: Some lessons from a tertiary study , 2017, Inf. Softw. Technol..

[14]  Muhammad Ali Babar,et al.  Systematic reviews in software engineering: An empirical investigation , 2013, Inf. Softw. Technol..

[15]  Claes Wohlin,et al.  On the reliability of mapping studies in software engineering , 2013, J. Syst. Softw..

[16]  Pearl Brereton,et al.  A systematic review of systematic review process research in software engineering , 2013, Inf. Softw. Technol..

[17]  C. Lantz,et al.  Behavior and interpretation of the κ statistic: Resolution of the two paradoxes , 1996 .

[18]  Muhammad Ali Babar,et al.  Systematic literature reviews in software engineering: Preliminary results from interviews with researchers , 2009, 2009 3rd International Symposium on Empirical Software Engineering and Measurement.

[19]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[20]  Pearl Brereton,et al.  Performing systematic literature reviews in software engineering , 2006, ICSE.

[21]  A. Feinstein,et al.  High agreement but low kappa: I. The problems of two paradoxes. , 1990, Journal of clinical epidemiology.

[22]  Pearl Brereton,et al.  Lessons from applying the systematic literature review process within the software engineering domain , 2007, J. Syst. Softw..

[23]  B. Goulart,et al.  How to avoid bias in systematic reviews of observational studies , 2017 .

[24]  Kai Petersen,et al.  Evaluating strategies for study selection in systematic literature studies , 2014, ESEM '14.

[25]  Barbara A. Kitchenham Evidence-Based Software Engineering and Systematic Literature Reviews , 2006, PROFES.

[26]  Klaus Krippendorff,et al.  Computing Krippendorff's Alpha-Reliability , 2011 .

[27]  Luis de-Marcos,et al.  Scrutinizing Systematic Literature Review Process in Software Engineering , 2016 .

[28]  Hyun Jung Kim,et al.  Measurement of Inter-Rater Reliability in Systematic Review , 2015 .

[29]  I. Olkin,et al.  Estimating time to conduct a meta-analysis from number of citations retrieved. , 1999, JAMA.

[30]  Ruth Cobos Pérez,et al.  Development of Procedures to Assess Problem-Solving Competence in Computing Engineering , 2017, IEEE Transactions on Education.

[31]  Austen Rainer,et al.  Case Study Research in Software Engineering - Guidelines and Examples , 2012 .