Power Weighted Versions of Bennett, Alpert, and Goldstein’s

A weighted version of Bennett, Alpert, and Goldstein’s S, denoted by , is studied. It is shown that the special cases of are often ordered in the same way. It is also shown that many special cases of tend to produce values close to unity, especially when the number of categories of the rating scale is large. It is argued that the application of as an agreement coefficient is not without difficulties.

[1]  R. Victorino,et al.  Development and validation of a clinical scale for the diagnosis of drug‐induced hepatitis , 1997, Hepatology.

[2]  J. Uebersax Diversity of decision-making models and the measurement of interrater agreement. , 1987 .

[3]  R. Peterson,et al.  Interjudge Agreement and the Maximum Value of Kappa , 1989 .

[4]  Alan Agresti,et al.  Categorical Data Analysis , 2003 .

[5]  P. Williams-Russo,et al.  Assessing Sedation with Regional Anesthesia: Inter-Rater Agreement on a Modified Wilson Sedation Scale , 2002, Anesthesia and analgesia.

[6]  M. Tanner,et al.  Modeling ordinal scale disagreement. , 1985, Psychological bulletin.

[7]  J. Guilford,et al.  A Note on the G Index of Agreement , 1964 .

[8]  S D Walter,et al.  A reappraisal of the kappa coefficient. , 1988, Journal of clinical epidemiology.

[9]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[10]  A E Maxwell,et al.  Coefficients of Agreement Between Observers and Their Interpretation , 1977, British Journal of Psychiatry.

[11]  Justus J. Randolph Free-Marginal Multirater Kappa (multirater K[free]): An Alternative to Fleiss' Fixed-Marginal Multirater Kappa. , 2005 .

[12]  A. Beardon Sums of Powers of Integers , 1996 .

[13]  Dale J. Prediger,et al.  Coefficient Kappa: Some Uses, Misuses, and Alternatives , 1981 .

[14]  P. Graham,et al.  The analysis of ordinal agreement data: beyond weighted kappa. , 1993, Journal of clinical epidemiology.

[15]  R. Alpert,et al.  Communications Through Limited-Response Questioning , 1954 .

[16]  H. Brenner,et al.  Dependence of Weighted Kappa Coefficients on the Number of Categories , 1996, Epidemiology.

[17]  W. Willett,et al.  Misinterpretation and misuse of the kappa statistic. , 1987, American journal of epidemiology.

[18]  N D Holmquist,et al.  Variability in classification of carcinoma in situ of the uterine cervix. , 1967, Archives of pathology.

[19]  Jeroen de Mast,et al.  Measurement system analysis for categorical measurements: Agreement and kappa-type indices , 2007 .

[20]  P. E. Crewson,et al.  Reader agreement studies. , 2005, AJR. American journal of roentgenology.

[21]  L. Hsu,et al.  Interrater Agreement Measures: Comments on Kappan, Cohen's Kappa, Scott's π, and Aickin's α , 2003 .

[22]  Robert J. Glynn,et al.  Evaluation of an Iris Color Classification System , 2005 .

[23]  A. E. Maxwell Comparing the Classification of Subjects by Two Independent Judges , 1970, British Journal of Psychiatry.

[24]  Matthijs J. Warrens,et al.  Some Paradoxical Results for the Quadratically Weighted Kappa , 2012 .

[25]  C. Lantz,et al.  Behavior and interpretation of the κ statistic: Resolution of the two paradoxes , 1996 .

[26]  Adelin Albert,et al.  A note on the linearly weighted kappa coefficient for ordinal scales , 2009 .

[27]  Jeffrey S. Simonoff,et al.  Analyzing Categorical Data , 2003 .

[28]  Klaus Krippendorff,et al.  Association, agreement, and equity , 1987 .

[29]  J. Sim,et al.  The kappa statistic in reliability studies: use, interpretation, and sample size requirements. , 2005, Physical therapy.

[30]  J. R. Landis,et al.  An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. , 1977, Biometrics.

[31]  K. Lynch,et al.  Inter-rater reliability of the SCID alcohol and substance use disorders section among adolescents. , 2000, Drug and alcohol dependence.

[32]  Jacob Cohen,et al.  The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .

[33]  T. Allison,et al.  A New Procedure for Assessing Reliability of Scoring EEG Sleep Recordings , 1971 .

[34]  David J. Hand,et al.  A Handbook of Small Data Sets , 1993 .

[35]  M P Becker,et al.  Using association models to analyse agreement data: two examples. , 1989, Statistics in medicine.

[36]  J. Fleiss,et al.  Statistical methods for rates and proportions , 1973 .

[37]  P. McCullagh Analysis of Ordinal Categorical Data , 1985 .

[38]  Cathy Frey,et al.  Investigative Ophthalmology and Visual Science , 2010 .

[39]  C L Janes,et al.  An Extension of the Random Error Coefficient of Agreement to N x N Tables , 1979, British Journal of Psychiatry.

[40]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[41]  B. Everitt,et al.  Statistical methods for rates and proportions , 1973 .

[42]  M. Warrens Conditional inequalities between Cohen's kappa and weighted kappas , 2013 .

[43]  R D Sperduto,et al.  Evaluation of an iris color classification system. The Eye Disorders Case-Control Study Group. , 1990, Investigative ophthalmology & visual science.

[44]  D. Cicchetti,et al.  Developing criteria for establishing interrater reliability of specific items: applications to assessment of adaptive behavior. , 1981, American journal of mental deficiency.

[45]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[46]  J. Miller,et al.  Glasgow Outcome Scale: an inter-rater reliability study. , 1993, Brain injury.

[47]  J. Vegelius,et al.  On Generalizations Of The G Index And The Phi Coefficient To Nominal Scales. , 1979, Multivariate behavioral research.

[48]  Richard W. Bohannon,et al.  Interrater reliability of a modified Ashworth scale of muscle spasticity. , 1987, Physical therapy.

[49]  Alexander von Eye,et al.  On the Marginal Dependency of Cohen’s κ , 2008 .

[50]  Matthijs J. Warrens,et al.  Cohen's kappa can always be increased and decreased by combining categories , 2010 .

[51]  G. Meyer Assessing Reliability : Critical Corrections for a Critical Examination of the Rorschach Comprehensive System , 2001 .

[52]  Matthijs J. Warrens Cohen’s linearly weighted kappa is a weighted average , 2012, Adv. Data Anal. Classif..

[53]  Werner Vach,et al.  The dependence of Cohen's kappa on the prevalence does not matter. , 2005, Journal of clinical epidemiology.

[54]  Alan Agresti,et al.  Mathematical and computer modelling reports: A model for agreement between ratings on an ordinal scale , 1988 .

[55]  Matthijs J. Warrens,et al.  The effect of combining categories on Bennett, Alpert and Goldstein's S , 2012 .

[56]  A. Feinstein,et al.  High agreement but low kappa: I. The problems of two paradoxes. , 1990, Journal of clinical epidemiology.

[57]  Christof Schuster,et al.  A Note on the Interpretation of Weighted Kappa and its Relations to Other Rater Agreement Statistics for Metric Scales , 2004 .

[58]  J. D. Mast Agreement and Kappa-Type Indices , 2007 .