Weighted 2 x 2 kappa coefficients: recommended indices of diagnostic accuracy for evidence-based practice.

OBJECTIVES The diagnostic accuracy of a clinical test is typically evaluated by comparing the classification of individuals by the test with their classification by some diagnostic gold standard. The most popular indices of diagnostic accuracy are sensitivity, specificity, and positive and negative predictive values. However, these measures are affected by the fact that some diagnostic decisions will be correct by chance, resulting in values that differ widely between indices, and clinicians may be confused as to whether results indicate that a test is good or poor. In keeping with the principle in evidence-based practice that decisions should be based on evidence not guesswork, we might expect that measures of diagnostic accuracy should therefore be chance corrected. The objective of this article was to advocate this and draw attention to indices that achieve this requirement. STUDY DESIGN AND SETTING The principles underlying calculations of diagnostic accuracy are presented as a framework for understanding the problem and its solution. RESULTS Disparities between different indices of diagnostic accuracy may be resolved by adjusting them to correct for chance effects. This produces a pair of weighted 2 x 2 "diagnostic" kappa coefficients offering a number of theoretical and practical advantages. CONCLUSION Routine use of weighted 2 x 2 kappa coefficients as indices of diagnostic accuracy is recommended.

[1]  P. Armitage,et al.  Statistical methods in medical research , 1971 .

[2]  Alan E. Kazdin,et al.  Measuring the potency of risk factors for clinical or policy significance. , 1999 .

[3]  G. Guyatt,et al.  Users' Guides to the Medical Literature: III. How to Use an Article About a Diagnostic Test: B. What Are the Results and Will They Help Me In Caring for My Patients? , 1994 .

[4]  R. Logan,et al.  Clinical Epidemiology: A Basic Science for Clinical Medicine , 1992 .

[5]  Jonathan J Deeks,et al.  Systematic reviews in health care: Systematic reviews of evaluations of diagnostic and screening tests. , 2001, BMJ.

[6]  H. Kraemer,et al.  Kappa coefficients in epidemiology: an appraisal of a reappraisal. , 1988, Journal of clinical epidemiology.

[7]  F. Harrell,et al.  Sensitivity and specificity should be de-emphasized in diagnostic accuracy studies. , 2003, Academic radiology.

[8]  H C van Houwelingen,et al.  The (in)validity of sensitivity and specificity. , 2000, Statistics in medicine.

[9]  John A. Swets,et al.  Evaluation of diagnostic systems : methods from signal detection theory , 1982 .

[10]  K. Schulz,et al.  Uses and abuses of screening tests , 2002, The Lancet.

[11]  D. Sackett,et al.  The Ends of Human Life: Medical Ethics in a Liberal Polity , 1992, Annals of Internal Medicine.

[12]  M. O'Brecht,et al.  Evaluating medical tests: Objective and quantitative guidelines , 1995 .

[13]  S D Walter,et al.  A reappraisal of the kappa coefficient. , 1988, Journal of clinical epidemiology.

[14]  Helena Chmura Kraemer,et al.  Reconsidering the odds ratio as a measure of 2×2 association in a population , 2004, Statistics in medicine.

[15]  Werner Vach,et al.  The dependence of Cohen's kappa on the prevalence does not matter. , 2005, Journal of clinical epidemiology.

[16]  Robert F. Woolson,et al.  Statistical Methods for the Analysis of Biomedical Data. , 1990 .

[17]  Brian Everitt,et al.  Statistical Methods for Medical Investigations , 1990 .

[18]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[19]  K. Johnson The two by two diagram: a graphical truth table. , 1999, Journal of clinical epidemiology.

[20]  H. Kraemer,et al.  2 x 2 kappa coefficients: measures of agreement or association. , 1989, Biometrics.

[21]  E. Keeler,et al.  Primer on certain elements of medical decision making. , 1975, The New England journal of medicine.

[22]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[23]  Philipp Dahm,et al.  Evidence-Based Surgery Chirurgie factuelle Users ’ guides to the surgical literature : how to use an article about a diagnostic test , 2001 .

[24]  Art Noda,et al.  Kappa coefficients in medical research , 2002, Statistics in medicine.

[25]  J. Sterne,et al.  Essential Medical Statistics , 2003 .

[26]  J. Knottnerus,et al.  Assessment of the accuracy of diagnostic tests: the cross-sectional study. , 2003, Journal of clinical epidemiology.

[27]  G. Guyatt,et al.  Users' guides to the medical literature. III. How to use an article about a diagnostic test. B. What are the results and will they help me in caring for my patients? The Evidence-Based Medicine Working Group. , 1994, JAMA.

[28]  J. Fleiss Statistical methods for rates and proportions , 1974 .

[29]  F. Hoehler Bias and prevalence effects on kappa viewed in terms of sensitivity and specificity. , 2000, Journal of clinical epidemiology.

[30]  J. Knottnerus,et al.  The standards for reporting of diagnostic accuracy. , 2003, Journal of clinical epidemiology.

[31]  J André Knottnerus,et al.  Evidence base of clinical diagnosis Evaluation of diagnostic procedures , 2022 .

[32]  Kenneth F Schulz,et al.  Refining clinical diagnosis with likelihood ratios , 2005, The Lancet.

[33]  H. Morgenstern,et al.  Epidemiologic Research: Principles and Quantitative Methods. , 1983 .

[34]  G. Guyatt,et al.  Users' guides to the medical literature. , 1993, JAMA.

[35]  A. Feinstein,et al.  Variance and dissent , 1983 .

[36]  M. Graffar [Modern epidemiology]. , 1971, Bruxelles medical.

[37]  C. Erhardt,et al.  INTERPRETATION AND USES OF MEDICAL STATISTICS , 1970 .

[38]  R. Foy,et al.  About time: diagnostic guidelines that help clinicians , 2003, Quality & safety in health care.

[39]  D. Sackett,et al.  Evidence based medicine: what it is and what it isn't , 1996, BMJ.

[40]  W. Grove Statistical Methods for Rates and Proportions, 2nd ed , 1981 .

[41]  A. Feinstein,et al.  High agreement but low kappa: I. The problems of two paradoxes. , 1990, Journal of clinical epidemiology.

[42]  B. Everitt,et al.  Statistical Methods for Medical Investigations. , 1992 .

[43]  H. Kraemer Correlation coefficients in medical research: from product moment correlation to the odds ratio , 2006, Statistical methods in medical research.

[44]  Jeffrey D Blume,et al.  Likelihood methods for measuring statistical evidence , 2002, Statistics in medicine.

[45]  D. Sackett,et al.  The architecture of diagnostic research , 2002, BMJ : British Medical Journal.

[46]  David Moher,et al.  Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative. , 2004, Family practice.