Conceptualizing Disagreement in Qualitative Coding

Collaborative qualitative coding often involves coders assign- ing different labels to the same instance, leading to ambiguity. We refer to such an instance of ambiguity as disagreement in coding. Analyzing reasons for such a disagreement is essential-- both for purposes of bolstering user understanding gained from coding and reinterpreting the data collaboratively, and for negotiating user-assigned labels for building effective machine learning models. We propose a conceptual definition of collective disagreement using diversity and divergence within the coding distributions. This perspective of disagreement translates to diverse coding contexts and groups of coders irrespective of discipline. We introduce two tree-based ranking metrics as standardized ways of comparing disagreements in how data instances have been coded. We empirically validate that, of the two tree-based metrics, coders' perceptions of dis- agreement match more closely with the n-ary tree metric than with the post-traversal tree metric.

[1]  F. Fischer,et al.  Fostering collaborative knowledge construction with visualization tools , 2002 .

[2]  Martha Cleveland-Innes,et al.  Revisiting methodological issues in transcript analysis: Negotiated coding and reliability , 2006, Internet High. Educ..

[3]  Richard L. Gorsuch,et al.  Correlation Coefficients: Mean Bias and Confidence Interval Distortions , 2011 .

[4]  M. McHugh Interrater reliability: the kappa statistic , 2012, Biochemia medica.

[5]  T. Marteau,et al.  The Place of Inter-Rater Reliability in Qualitative Research: An Empirical Study , 1997 .

[6]  R. Fisher 014: On the "Probable Error" of a Coefficient of Correlation Deduced from a Small Sample. , 1921 .

[7]  Ece Kamar,et al.  Revolt: Collaborative Crowdsourcing for Labeling Machine Learning Datasets , 2017, CHI.

[8]  Lydia B. Chilton,et al.  MicroTalk: Using Argumentation to Improve Crowdsourcing Accuracy , 2016, HCOMP.

[9]  Anca Dumitrache Crowdsourcing Disagreement for Collecting Semantic Annotation , 2015, ESWC.

[10]  John L. Campbell,et al.  Coding In-depth Semistructured Interviews , 2013 .

[11]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[12]  N. Epley,et al.  The Anchoring-and-Adjustment Heuristic , 2006, Psychological science.

[13]  Danai Koutra,et al.  Glance: rapidly coding behavioral video with the crowd , 2014, UIST.

[14]  Lora Aroyo,et al.  Truth Is a Lie: Crowd Truth and the Seven Myths of Human Annotation , 2015, AI Mag..

[15]  Kevin Crowston,et al.  Optimizing Features in Active Machine Learning for Complex Qualitative Content Analysis , 2014, LTCSS@ACL.

[16]  Johnny Saldaña,et al.  The Coding Manual for Qualitative Researchers , 2009 .

[17]  Megan K. Torkildson,et al.  Automating Large-Scale Annotation for Analysis of Social Media Content , 2012 .

[18]  Anselm L. Strauss,et al.  Qualitative Analysis For Social Scientists , 1987 .

[19]  Thomas Abel,et al.  From Text to Codings: Intercoder Reliability Assessment in Qualitative Content Analysis , 2008, Nursing research.

[20]  Jeffrey Heer,et al.  Parting Crowds: Characterizing Divergent Interpretations in Crowdsourced Annotation Tasks , 2016, CSCW.

[21]  Kevin A Hallgren,et al.  Computing Inter-Rater Reliability for Observational Data: An Overview and Tutorial. , 2012, Tutorials in quantitative methods for psychology.

[22]  D. Kumaran,et al.  Frames, Biases, and Rational Decision-Making in the Human Brain , 2006, Science.

[23]  Tirso E. Diaz,et al.  Ill-structured measurement designs in organizational research: implications for estimating interrater reliability. , 2008, The Journal of applied psychology.

[24]  Daniel J. Hruschka,et al.  Reliability in Coding Open-Ended Data: Lessons Learned from HIV Behavioral Research , 2004 .

[25]  Michael J. Burke,et al.  Averaging Correlations: Expected Values and Bias in Combined Pearson rs and Fisher's z Transformations , 1998 .

[26]  Kenneth Benoit,et al.  Coder Reliability and Misclassification in the Human Coding of Party Manifestos , 2012, Political Analysis.

[27]  R. Fisher FREQUENCY DISTRIBUTION OF THE VALUES OF THE CORRELATION COEFFIENTS IN SAMPLES FROM AN INDEFINITELY LARGE POPU;ATION , 1915 .

[28]  M. Sheelagh T. Carpendale,et al.  Analyzing Qualitative Data , 2017, ISS.

[29]  Cecilia R. Aragon,et al.  Aeonium: Visual analytics to support collaborative qualitative coding , 2017, 2017 IEEE Pacific Visualization Symposium (PacificVis).

[30]  K. Gwet Computing inter-rater reliability and its variance in the presence of high agreement. , 2008, The British journal of mathematical and statistical psychology.