Assessing Agreement on Classification Tasks: The Kappa Statistic

Currently, computational linguists and cognitive scientists working in the area of discourse and dialogue argue that their subjective judgments are reliable using several different statistics, none of which are easily interpretable or comparable to each other. Meanwhile, researchers in content analysis have already experienced the same difficulties and come up with a solution in the kappa statistic. We discuss what is wrong with reliability measures as they are currently used for discourse and dialogue work in computational linguistics and cognitive science, and argue that we would be better off as a field adopting techniques from content analysis.

[1]  Berry Cc The kappa statistic. , 1992, JAMA.

[2]  P. Prescott,et al.  Issues in the Use of Kappa to Estimate Reliability , 1986, Medical care.

[3]  Mari Ostendorf,et al.  TOBI: a standard for labeling English prosody , 1992, ICSLP.

[4]  Klaus Krippendorff,et al.  Content Analysis: An Introduction to Its Methodology , 1980 .

[5]  H. Kraemer,et al.  Extension of the kappa coefficient. , 1980, Biometrics.

[6]  R. Weber Basic Content Analysis , 1986 .

[7]  Julia Hirschberg,et al.  Disambiguating Cue Phrases in Text and Speech , 1990, COLING.

[8]  S. Siegel,et al.  Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[9]  John O. Greene,et al.  Cognition and Talk: The Relationship of Semantic Units to Temporal Patterns of Fluency in Spontaneous Speech , 1986 .

[10]  Rebecca J. Passonneau,et al.  Intention-Based Segmentation: Human Reliability and Correlation with Linguistic Cues , 1993, ACL.

[11]  M Ker,et al.  Issues in the use of kappa. , 1991, Investigative radiology.

[12]  Stephen Isard,et al.  Conversational Games within Dialogue , 1991 .

[13]  José Gabriel Pereira Lopes,et al.  Temporal Structure of Discourse , 1992, COLING.

[14]  Candace L. Sidner,et al.  Attention, Intentions, and the Structure of Discourse , 1986, CL.

[15]  L SidnerCandace,et al.  Attention, intentions, and the structure of discourse , 1986 .

[16]  Janet E. Cahn,et al.  An investigation into the correlation of cue phrases, unfilled pauses and the structuring of spoken discourse , 1995, ArXiv.