The analysis of ordinal agreement data: beyond weighted kappa.

The weighted kappa statistic has been used as an agreement index for ordinal data. Using data on the comparability of primary and proxy respondent reports of alcohol drinking frequency we show that the value of weighted kappa can be sensitive to the choice of weights. The distinction between association and agreement is clarified and it is shown that in some respects weighted kappa behaves more like a measure of association than an index of agreement. In particular, it is demonstrated that the weighted kappa statistic is not always sensitive to differences in the observed proportion in exact agreement and that high values of weighted kappa can be observed even when the level of agreement is low. We illustrate the use of statistical models in the analysis of epidemiologic agreement data and conclude that modelling ordinal agreement data produces insights which cannot be obtained through the use of weighted kappa statistics.

[1]  Jacob Cohen,et al.  The Equivalence of Weighted Kappa and the Intraclass Correlation Coefficient as Measures of Reliability , 1973 .

[2]  R. Beaglehole,et al.  Alcohol consumption and risk of coronary heart disease. , 1991, BMJ.

[3]  M. Becker,et al.  Maximum Likelihood Estimation of the RC(M) Association Model , 1990 .

[4]  H. Kraemer,et al.  2 x 2 kappa coefficients: measures of agreement or association. , 1989, Biometrics.

[5]  W. Willett,et al.  Misinterpretation and misuse of the kappa statistic. , 1987, American journal of epidemiology.

[6]  R. Priore,et al.  Spouse-subject interviews and the reliability of diet studies. , 1980, American journal of epidemiology.

[7]  J. Fleiss Statistical methods for rates and proportions , 1974 .

[8]  T D Koepsell,et al.  Proxy respondents in epidemiologic research. , 1990, Epidemiologic reviews.

[9]  J. Darroch,et al.  Category Distinguishability and Observer Agreement , 1986 .

[10]  M P Becker,et al.  Using association models to analyse agreement data: two examples. , 1989, Statistics in medicine.

[11]  A. Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[12]  P. Graham,et al.  Primary versus proxy respondents: comparability of questionnaire data on alcohol consumption. , 1993, American journal of epidemiology.

[13]  A R Feinstein,et al.  A bibliography of publications on observer variability. , 1985, Journal of chronic diseases.

[14]  F. Thompson,et al.  Reproducibility of reports of frequency of food use in the Tecumseh Diet Methodology Study. , 1987, American journal of epidemiology.

[15]  V M Hawthorne,et al.  Comparison of surrogate and subject reports of dietary practices, smoking habits and weight among married couples in the Tecumseh Diet Methodology Study. , 1989, Journal of clinical epidemiology.

[16]  J. Carlin,et al.  Evaluation of the properties and reliability of a clinical severity scale for acute asthma in children. , 1992, Journal of clinical epidemiology.

[17]  T. M. Kashner,et al.  Patient-proxy response comparability on measures of patient health and functional status. , 1988, Journal of clinical epidemiology.

[18]  J. Samet,et al.  Comparison of self- and surrogate-reported dietary information. , 1984, American journal of epidemiology.

[19]  W. Knaus,et al.  Reliability of a measure of severity of illness: acute physiology of chronic health evaluation--II. , 1992, Journal of clinical epidemiology.

[20]  Douglas G. Altman,et al.  Measurement in Medicine: The Analysis of Method Comparison Studies , 1983 .

[21]  Alan Agresti,et al.  Mathematical and computer modelling reports: A model for agreement between ratings on an ordinal scale , 1988 .