The effect of combining categories on Bennett, Alpert and Goldstein's S

Abstract Cohen’s kappa is the most widely used descriptive measure of interrater agreement on a nominal scale. A measure that has repeatedly been proposed in the literature as an alternative to Cohen’s kappa is Bennett, Alpert and Goldstein’s S . The latter measure is equivalent to Janson and Vegelius’ C and Brennan and Prediger’s kappa n . An agreement table can be collapsed into a table of smaller size by partitioning categories into subsets. The paper presents several results on how the overall S -value is related to the S -values of the collapsed tables. It is shown that, if the categories are partitioned into subsets of the same size and if we consider all collapsed tables of this partition type, then the overall S -value is equivalent to the average S -value of the collapsed tables. This result illustrates that there are types of partitioning the categories that, on average, do not result in loss of information in terms of the S -value. In addition, it is proved that for all other partition types, the overall S -value is strictly smaller than the average S -value of the collapsed tables. A consequence is that there is always at least one way to combine categories such that the S -value increases. The S -value increases if we combine categories on which there exists considerable disagreement.

[1]  C L Janes,et al.  An Extension of the Random Error Coefficient of Agreement to N x N Tables , 1979, British Journal of Psychiatry.

[2]  C. Lantz,et al.  Behavior and interpretation of the κ statistic: Resolution of the two paradoxes , 1996 .

[3]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[4]  L. Cronbach Coefficient alpha and the internal structure of tests , 1951 .

[5]  Klaas Sijtsma,et al.  On the Use, the Misuse, and the Very Limited Usefulness of Cronbach’s Alpha , 2008, Psychometrika.

[6]  M. Banerjee,et al.  Beyond kappa: A review of interrater agreement measures , 1999 .

[7]  Alexander von Eye,et al.  Analyzing Rater Agreement: Manifest Variable Methods , 2004 .

[8]  Matthijs J. Warrens A family of multi-rater kappas that can always be increased and decreased by combining categories , 2012 .

[9]  Matthijs J. Warrens Inequalities Between Kappa and Kappa-Like Statistics for k×k Tables , 2010 .

[10]  A E Maxwell,et al.  Coefficients of Agreement Between Observers and Their Interpretation , 1977, British Journal of Psychiatry.

[11]  Justus J. Randolph Free-Marginal Multirater Kappa (multirater K[free]): An Alternative to Fleiss' Fixed-Marginal Multirater Kappa. , 2005 .

[12]  W. Willett,et al.  Misinterpretation and misuse of the kappa statistic. , 1987, American journal of epidemiology.

[13]  Matthijs J. Warrens,et al.  Inequalities between multi-rater kappas , 2010, Adv. Data Anal. Classif..

[14]  W. A. Scott,et al.  Reliability of Content Analysis ; The Case of Nominal Scale Cording , 1955 .

[15]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[16]  J. Vegelius,et al.  On Generalizations Of The G Index And The Phi Coefficient To Nominal Scales. , 1979, Multivariate behavioral research.

[17]  Alexander von Eye,et al.  On the Marginal Dependency of Cohen’s κ , 2008 .

[18]  Matthijs J. Warrens,et al.  Cohen's kappa can always be increased and decreased by combining categories , 2010 .

[19]  James C. Reed Book Reviews : Visual Perceptual Abilities and Early Reading Progress by Jean Turner Goins, Supplementary Educational Monographs, #87, Chicago: University of Chicago Press, 1958, Pp. x + 108 , 1960 .

[20]  D. Owen Handbook of Mathematical Functions with Formulas , 1965 .

[21]  J. Guilford,et al.  A Note on the G Index of Agreement , 1964 .

[22]  A. E. Maxwell Comparing the Classification of Subjects by Two Independent Judges , 1970, British Journal of Psychiatry.

[23]  R. Alpert,et al.  Communications Through Limited-Response Questioning , 1954 .

[24]  Jeroen de Mast,et al.  Measurement system analysis for categorical measurements: Agreement and kappa-type indices , 2007 .

[25]  Rebecca Zwick,et al.  Another look at interrater agreement. , 1988, Psychological bulletin.

[26]  Art Noda,et al.  Kappa coefficients in medical research , 2002, Statistics in medicine.

[27]  John S. Uebersax,et al.  Diversity of decision-making models and the measurement of interrater agreement. , 1987 .

[28]  Nambury S. Raju,et al.  A generalization of coefficient alpha , 1977 .

[29]  Klaus Krippendorff,et al.  Association, agreement, and equity , 1987 .

[30]  G. Meyer Assessing Reliability: Critical Corrections for a Critical Examination of the Rorschach Comprehensive System. , 1997 .

[31]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[32]  A. Feinstein,et al.  High agreement but low kappa: I. The problems of two paradoxes. , 1990, Journal of clinical epidemiology.

[33]  Dale J. Prediger,et al.  Coefficient Kappa: Some Uses, Misuses, and Alternatives , 1981 .

[34]  H. Kraemer,et al.  2 x 2 kappa coefficients: measures of agreement or association. , 1989, Biometrics.

[35]  L. Hsu,et al.  Interrater Agreement Measures: Comments on Kappan, Cohen's Kappa, Scott's π, and Aickin's α , 2003 .

[36]  J. D. Mast Agreement and Kappa-Type Indices , 2007 .

[37]  Adelin Albert,et al.  Agreement between Two Independent Groups of Raters , 2009 .

[38]  R. Peterson,et al.  Interjudge Agreement and the Maximum Value of Kappa , 1989 .

[39]  E. Ross,et al.  Philosophy of Science Association , 2022 .

[40]  S D Walter,et al.  A reappraisal of the kappa coefficient. , 1988, Journal of clinical epidemiology.

[41]  Matthijs J. Warrens,et al.  Weighted kappa is higher than Cohen's kappa for tridiagonal agreement tables , 2011 .

[42]  Werner Vach,et al.  The dependence of Cohen's kappa on the prevalence does not matter. , 2005, Journal of clinical epidemiology.

[43]  A. Donner,et al.  The effect of collapsing multinomial data when assessing agreement. , 2000, International journal of epidemiology.

[44]  Alexander von Eye,et al.  Analyzing rater agreement , 2013 .

[45]  B. Everitt,et al.  Statistical methods for rates and proportions , 1973 .

[46]  A. Agresti An introduction to categorical data analysis , 1997 .

[47]  M. Warrens On Similarity Coefficients for 2×2 Tables and Correction for Chance , 2008, Psychometrika.

[48]  C. Mann,et al.  A Practical Treatise on Diseases of the Skin , 1889, Atlanta Medical and Surgical Journal (1884).

[49]  H. Kraemer Ramifications of a population model forκ as a coefficient of reliability , 1979 .

[50]  Matthijs J. Warrens,et al.  Cohen's kappa is a weighted average , 2011 .

[51]  Hubert J. A. Schouten,et al.  Nominal scale agreement among observers , 1986 .

[52]  A. Gregoriades,et al.  Assessing the reliability of socio‐technical systems , 2002 .

[53]  Matthijs J. Warrens,et al.  On the Equivalence of Cohen’s Kappa and the Hubert-Arabie Adjusted Rand Index , 2008, J. Classif..

[54]  Matthijs J. Warrens,et al.  A Formal Proof of a Paradox Associated with Cohen’s Kappa , 2010, J. Classif..

[55]  M. Warrens On Association Coefficients for 2×2 Tables and Properties That Do Not Depend on the Marginal Distributions , 2008, Psychometrika.