Category Distinguishability and Observer Agreement

Summary It is common in the medical, biological, and social sciences for the categories into which an object is classified not to have a fully objective definition. Theoretically speaking the categories are therefore not completely distinguishable. The practical extent of their distinguishability can be measured when two expert observers classify the same sample of objects. It is shown, under reasonable assumptions, that the matrix of joint classification probabilities is quasi-symmetric, and that the symmetric matrix component is non-negative definite. The degree of distinguishability between two categories is defined and is used to give a measure of overall category distinguishability. It is argued that the kappa measure of observer agreement is unsatisfactory as a measure of overall category distinguishability.

[1]  L. Kurland,et al.  Studies on multiple sclerosis in Winnepeg, Manitoba, and New Orleans, Louisiana. I. Prevalence; comparison between the patient groups in Winnipeg and New Orleans. , 1953, American journal of hygiene.

[2]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[3]  Jacob Cohen,et al.  Weighted kappa: Nominal scale agreement provision for scaled disagreement or partial credit. , 1968 .

[4]  Brian Everitt,et al.  MOMENTS OF THE STATISTICS KAPPA AND WEIGHTED KAPPA , 1968 .

[5]  G. Koch,et al.  Analysis of categorical data by linear models. , 1969, Biometrics.

[6]  B. Everitt,et al.  Large sample standard errors of kappa and weighted kappa. , 1969 .

[7]  J. Fleiss Measuring nominal scale agreement among many raters. , 1971 .

[8]  Jacob Cohen,et al.  The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .

[9]  Gary G. Koch,et al.  A review of statistical methods in the analysis of data arising from observer reliability studies (Part II) , 1975 .

[10]  J. R. Landis,et al.  An application of hierarchical kappa-type statistics in the assessment of majority agreement among multiple observers. , 1977, Biometrics.

[11]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[12]  J. R. Landis,et al.  A one-way components of variance model for categorical data , 1977 .

[13]  Lawrence Hubert,et al.  A general formula for the variance of Cohen's weighted kappa. , 1978 .

[14]  J. Fleiss,et al.  Inference About Weighted Kappa in the Non-Null Case , 1978 .

[15]  H. Kraemer Tests of homogeneity of independent correlation coefficients , 1979 .

[16]  H. Kraemer Ramifications of a population model forκ as a coefficient of reliability , 1979 .

[17]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[18]  A. J. Conger Integration and generalization of kappas for multiple raters. , 1980 .

[19]  J. Darroch,et al.  The Mantel-Haenszel Test and Tests of Marginal Symmetry; Fixed-Effects and Mixed Models for a Categorical Response, Correspondent Paper , 1981 .

[20]  J. Fleiss,et al.  Measuring Agreement for Multinomial Data , 1982 .

[21]  I. James Analysis of nonagreements among multiple raters , 1983 .

[22]  T. Speed,et al.  Additive and Multiplicative Models and Interactions , 1983 .

[23]  Annette J. Dobson,et al.  General observer-agreement measures on individual subjects and groups of subjects , 1984 .

[24]  Martin A. Tanner,et al.  Modeling Agreement among Raters , 1985 .