Applying Inter-rater Reliability and Agreement in Grounded Theory Studies in Software Engineering

Context : In recent years, the qualitative research on empirical software engineering that applies Grounded Theory is increasing. Grounded Theory (GT) is a technique for developing theory inductively e iteratively from qualitative data based on theoretical sampling, coding, constant comparison, memoing, and saturation, as main characteristics. Large or controversial GT studies may involve multiple researchers in collaborative coding, which requires a kind of rigor and consensus that an individual coder does not. Although many qualitative researchers reject quantitative measures in favor of other qualitative criteria, many others are committed to measuring consensus through Inter-Rater Reliability (IRR) and/or Inter-Rater Agreement (IRA) techniques to develop a shared understanding of the phenomenon being studied. However, there are no specific guidelines about how and when to apply IRR/IRA during the iterative process of GT, so researchers have been using ad hoc methods for years. Objective: This paper presents a process for systematically applying IRR/IRA in GT studies that meets the iterative nature of this qualitative research method, which is supported by a previous systematic literature review on applying IRR/RA in GT studies in software engineering. This process allows researchers to incrementally generate a theory while ensuring consensus on the constructs that support it and, thus, improving the rigor of qualitative research. Method: Meta-science guided us to analyze the issues and challenges of the GT method when various raters are involved in coding procedures and formalize a process to improve collaborative team science and consortia. Results: This process formalization helps researchers to apply IRR/IRA to GT studies when various raters are involved in coding. Measuring consensus among raters promotes communicability, transparency, reflexivity, replicability, and trustworthiness of the research. Conclusions: The application of this process to a GT study seems to support its feasibility. In the absence of further confirmation, this would represent the first step of a de facto standard to be applied to those GT studies that require IRR/IRA.

[1]  R. Fourie,et al.  Contrasting Classic, Straussian, and Constructivist Grounded Theory: Methodological and Philosophical Conflicts , 2015 .

[2]  Miguel P Caldas,et al.  Research design: qualitative, quantitative, and mixed methods approaches , 2003 .

[3]  Klaus Krippendorff,et al.  Computing Krippendorff's Alpha-Reliability , 2011 .

[4]  ศิวะพร ภู่พันธ์,et al.  Research Design: Qualitative, Quantitative, and Mixed Methods Approaches , 2018 .

[5]  Pearl Brereton,et al.  Performing systematic literature reviews in software engineering , 2006, ICSE.

[6]  K. Krippendorff Reliability in Content Analysis: Some Common Misconceptions and Recommendations , 2004 .

[7]  Paul Ralph,et al.  Grounded Theory in Software Engineering Research: A Critical Review and Guidelines , 2016, 2016 IEEE/ACM 38th International Conference on Software Engineering (ICSE).

[8]  D. Stull,et al.  Doing Team Ethnography: Warnings and Advice , 1997 .

[9]  Klaus Krippendorff,et al.  On the reliability of unitizing textual continua: Further developments , 2016 .

[10]  Johnny Saldaña,et al.  The Coding Manual for Qualitative Researchers , 2009 .

[11]  Alireza Nili,et al.  A Critical Analysis of Inter-Coder Reliability Methods in Information Systems Research , 2017, ACIS.

[12]  James Noble,et al.  Developing a grounded theory to explain the practices of self-organizing Agile teams , 2012, Empirical Software Engineering.

[13]  Jorge Pérez,et al.  Inter-Coder Agreement for Improving Reliability in Software Engineering Qualitative Research , 2020, ArXiv.

[14]  John Grundy,et al.  Recruitment, engagement and feedback in empirical software engineering studies in industrial contexts , 2017, Inf. Softw. Technol..

[15]  Leema K. Berland,et al.  Confusing Claims for Data: A Critique of Common Practices for Presenting Qualitative Research on Learning , 2014 .

[16]  Alireza Nili,et al.  An approach for selecting and using a method of inter-coder reliability in information management research , 2020, Int. J. Inf. Manag..

[17]  Chester H. McCall,et al.  The Application of Interrater Reliability as a Solidification Instrument in a Phenomenological Study , 2005 .

[18]  M. Fraser,et al.  Author Guidelines for Manuscripts Reporting on Qualitative Research , 2016, Journal of the Society for Social Work and Research.

[19]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[20]  Magnus C. Ohlsson,et al.  Experimentation in Software Engineering , 2000, The Kluwer International Series in Software Engineering.

[21]  C. MacPhail,et al.  Process guidelines for establishing Intercoder Reliability in qualitative studies , 2016 .

[22]  Jessica Díaz,et al.  Why are many business instilling a DevOps culture into their organization? , 2020, ArXiv.

[23]  Rashina Hoda Decoding Grounded Theory for Software Engineering , 2021, 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion).

[24]  Joel Olson,et al.  Applying Constant Comparative Method with Multiple Investigators and Inter-Coder Reliability , 2016 .

[25]  Pamela Jordan Basics of qualitative research: Grounded theory procedures and techniques , 1994 .

[26]  A. Strauss,et al.  The discovery of grounded theory: strategies for qualitative research aldine de gruyter , 1968 .

[27]  Viswanath Venkatesh,et al.  Bridging the Qualitative-Quantitative Divide: Guidelines for Conducting Mixed Methods Research in Information Systems , 2013, MIS Q..

[28]  T. Marteau,et al.  The Place of Inter-Rater Reliability in Qualitative Research: An Empirical Study , 1997 .

[29]  Line Dubé,et al.  Rigor in Information Systems Positivist Case Research: Current Practices , 2003, MIS Q..

[30]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[31]  Timothy F Chen,et al.  Interrater agreement and interrater reliability: key concepts, approaches, and applications. , 2013, Research in social & administrative pharmacy : RSAP.

[32]  Helene Joffe,et al.  Intercoder Reliability in Qualitative Research: Debates and Practical Guidelines , 2020 .

[33]  K. Charmaz,et al.  The pursuit of quality in grounded theory , 2020 .

[34]  유창조 Naturalistic Inquiry , 2022, The SAGE Encyclopedia of Research Design.

[35]  M. Sheelagh T. Carpendale,et al.  Analyzing Qualitative Data , 2017, ISS.

[36]  K. Charmaz,et al.  Constructing Grounded Theory , 2014 .

[37]  Kathleen M. MacQueen,et al.  Handbook for Team-Based Qualitative Research , 2007 .

[38]  Andrea Forte,et al.  Reliability and Inter-rater Reliability in Qualitative Research , 2019, Proc. ACM Hum. Comput. Interact..

[39]  Klaus Krippendorff,et al.  Answering the Call for a Standard Reliability Measure for Coding Data , 2007 .

[40]  Jessica Díaz,et al.  Systematic Literature Reviews in Software Engineering - Enhancement of the Study Selection Process using Cohen's Kappa Statistic , 2020, J. Syst. Softw..

[41]  Daniela Cruzes,et al.  Recommended Steps for Thematic Synthesis in Software Engineering , 2011, 2011 International Symposium on Empirical Software Engineering and Measurement.

[42]  Jessica Diaz,et al.  DevOps Research-Based Teaching Using Qualitative Research and Inter-Coder Agreement , 2022, IEEE Transactions on Software Engineering.

[43]  C. Weston,et al.  Analyzing Interview Data: The Development and Evolution of a Coding System , 2001 .