论文信息 - On the Usage of Kappa to Evaluate Agreement on Coding Tasks

On the Usage of Kappa to Evaluate Agreement on Coding Tasks

In recent years, the Kappa coefficient of agreement has become the de facto standard to evaluate intercoder agreement in the discourse and dialogue processing community. Together with the adoption of this standard, researchers have adopted one specific scale to evaluate Kappa values, the one proposed in (Krippendorff, 1980). In this position paper, I highlight some issues that should be taken into account when evaluating Kappa values. Finally, I speculate on whether Kappa could be used as a measure to evaluate a system’s performance.

Barbara Di Eugenio | Barbara Maria Di Eugenio

[1] Jean Carletta,et al. Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[2] Johanna D. Moore,et al. An Empirical Investigation of Proposals in Collaborative Dialogues , 1998, ACL.

[3] J. Searle. What is a Speech Act , 1996 .

[4] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[5] Toni Rietveld,et al. Statistical Techniques for the Study of Language and Language Behaviour , 1993 .

[6] K. Krippendorff. Krippendorff, Klaus, Content Analysis: An Introduction to its Methodology . Beverly Hills, CA: Sage, 1980. , 1980 .

[7] S. Siegel,et al. Nonparametric Statistics for the Behavioral Sciences , 2022, The SAGE Encyclopedia of Research Design.

[8] David R. Traum,et al. Discourse Obligations in Dialogue Processing , 1994, ACL.

[9] Janyce Wiebe,et al. Development and Use of a Gold-Standard Data Set for Subjectivity Classifications , 1999, ACL.

[10] Rebecca J. Passonneau. Applying Reliability Metrics to Co-Reference Annotation , 1997, ArXiv.

[11] N. Andreasen,et al. Reliability studies of psychiatric diagnosis. Theory and practice. , 1981, Archives of general psychiatry.

[12] Klaus Krippendorff,et al. Content Analysis: An Introduction to Its Methodology , 1980 .

[13] M. Black. Philosophy in America , 1965 .

[14] Jacob Cohen. A Coefficient of Agreement for Nominal Scales , 1960 .

[15] Johanna D. Moore,et al. The agreement process: an empirical investigation of human-human computer-mediated collaborative dialogs , 2000, Int. J. Hum. Comput. Stud..