论文信息 - Squibs and Discussions: The Kappa Statistic: A Second Look - 字舞流文

Squibs and Discussions: The Kappa Statistic: A Second Look

In recent years, the kappa coefficient of agreement has become the de facto standard for evaluating intercoder agreement for tagging tasks. In this squib, we highlight issues that affect and that the community has largely neglected. First, we discuss the assumptions underlying different computations of the expected agreement component of . Second, we discuss how prevalence and bias affect the measure.

Barbara Di Eugenio | Michael Glass | Barbara Maria Di Eugenio | Michael Glass

[1] W. A. Scott,et al. Reliability of Content Analysis ; The Case of Nominal Scale Cording , 1955 .

[2] Bernice W. Polemis. Nonparametric Statistics for the Behavioral Sciences , 1959 .

[3] Jacob Cohen. A Coefficient of Agreement for Nominal Scales , 1960 .

[4] J. Fleiss. Measuring nominal scale agreement among many raters. , 1971 .

[5] P. Romano,et al. Letter to the Editor/In Reply , 1976 .

[6] J J Bartko,et al. ON THE METHODS AND THEORY OF RELIABILITY , 1976, The Journal of nervous and mental disease.

[7] Klaus Krippendorff,et al. Content Analysis: An Introduction to Its Methodology , 1980 .

[8] K. Krippendorff. Krippendorff, Klaus, Content Analysis: An Introduction to its Methodology . Beverly Hills, CA: Sage, 1980. , 1980 .

[9] N. Andreasen,et al. Reliability studies of psychiatric diagnosis. Theory and practice. , 1981, Archives of general psychiatry.

[10] A. Feinstein,et al. High agreement but low kappa: II. Resolving the paradoxes. , 1990, Journal of clinical epidemiology.

[11] C. Berry. The κ Statistic , 1992 .

[12] Berry Cc. The kappa statistic. , 1992, JAMA.

[13] J. Carlin,et al. Bias, prevalence and kappa. , 1993, Journal of clinical epidemiology.

[14] Toni Rietveld,et al. Statistical Techniques for the Study of Language and Language Behaviour , 1993 .

[15] Jean Carletta,et al. Assessing Agreement on Classification Tasks: The Kappa Statistic , 1996, CL.

[16] Gwyneth Doherty-Sneddon,et al. The Reliability of a Dialogue Structure Coding Scheme , 1997, CL.

[17] Janyce Wiebe,et al. Development and Use of a Gold-Standard Data Set for Subjectivity Classifications , 1999, ACL.

[18] Johanna D. Moore,et al. The agreement process: an empirical investigation of human-human computer-mediated collaborative dialogs , 2000, Int. J. Hum. Comput. Stud..

[19] Barbara Di Eugenio,et al. On the Usage of Kappa to Evaluate Agreement on Coding Tasks , 2000, LREC.

[20] P. Shrout,et al. Fleiss, Joseph L † , 2005 .