Clinical calibration of DSM‐IV diagnoses in the World Mental Health (WMH) version of the World Health Organization (WHO) Composite International Diagnostic Interview (WMH‐CIDI)

An overview is presented of the rationale, design, and analysis plan for the WMH‐CIDI clinical calibration studies. As no clinical gold standard assessment is available for the DSM‐IV disorders assessed in the WMH‐CIDI, we adopted the goal of calibration rather than validation; that is, we asked whether WMH‐CIDI diagnoses are ‘consistent’ with diagnoses based on a state‐of‐the‐art clinical research diagnostic interview (SCID; Structured Clinical Interview for DSM‐IV) rather than whether they are ‘correct’. Consistency is evaluated both at the aggregate level (consistency of WMH‐CIDI and SCID prevalence estimates) and at the individual level (consistency of WMH‐CIDI and SCID diagnostic classifications). Although conventional statistics (sensitivity, specificity, Cohen's κ) are used to describe diagnostic consistency, an argument is made for considering the area under the receiver operator curve (AUC) to be a more useful general‐purpose measure of consistency. In addition, more detailed analyses are used to evaluate consistency on a substantive level. These analyses begin by estimating prediction equations in a clinical calibration subsample, with WMH‐CIDI symptom‐level data used to predict SCID diagnoses, and using the coefficients from these equations to assign predicted probabilities of SCID diagnoses to each respondent in the remainder of the sample. Substantive analyses then investigate whether estimates of prevalence and associations when based on WMH‐CIDI diagnoses are consistent with those based on predicted SCID diagnoses. Multiple imputation is used to adjust estimated standard errors for the imprecision introduced by SCID diagnoses being imputed under a model rather than measured directly. A brief illustration of this approach is presented in comparing the precision of SCID and predicted SCID estimates of prevalence and correlates under varying sample designs. Copyright © 2004 Whurr Publishers Ltd.

[1]  H. Pincus,et al.  A review and synthesis of studies on minor depression and other 'brand names' , 2006 .

[2]  Richard J. Cook,et al.  Kappa and Its Dependence on Marginal Rates , 2005 .

[3]  T. B. Üstün,et al.  The World Mental Health (WMH) Survey Initiative version of the World Health Organization (WHO) Composite International Diagnostic Interview (CIDI) , 2004, International journal of methods in psychiatric research.

[4]  Helena Chmura Kraemer,et al.  Measures of clinical significance. , 2003, Journal of the American Academy of Child and Adolescent Psychiatry.

[5]  Olga V. Demler,et al.  The epidemiology of major depressive disorder: results from the National Comorbidity Survey Replication (NCS-R). , 2003, JAMA.

[6]  Scott Zeger,et al.  Methods for evaluating the performance of diagnostic tests in the absence of a gold standard: a latent class model approach , 2002, Statistics in medicine.

[7]  D. Regier,et al.  Revised prevalence estimates of mental disorders in the United States: using a clinical significance criterion to reconcile 2 surveys' estimates. , 2002, Archives of general psychiatry.

[8]  Constantine Daskalakis,et al.  Regression analysis of multiple-source longitudinal outcomes: a "Stirling County" depression study. , 2002, American journal of epidemiology.

[9]  P. Houck,et al.  Reliability of the self‐report version of the panic disorder severity scale , 2002, Depression and anxiety.

[10]  A. Rush,et al.  Methods to improve diagnostic accuracy in a community mental health setting. , 2000, The American journal of psychiatry.

[11]  N. Laird,et al.  A comparison of diagnostic interviews for depression in the Stirling County study: challenges for psychiatric epidemiology. , 2000, Archives of general psychiatry.

[12]  W. Eaton,et al.  A comparison of self-report and clinical diagnostic interviews for depression: diagnostic interview schedule and schedules for clinical assessment in neuropsychiatry in the Baltimore epidemiologic catchment area follow-up. , 2000, Archives of general psychiatry.

[13]  D. Regier Community diagnosis counts. , 2000, Archives of general psychiatry.

[14]  R. Kessler,et al.  Methodological issues in assessing psychiatric disorders with self-reports. , 2000 .

[15]  P. Jensen,et al.  Who's Up First? Testing for Order Effects in Structured Interviews Using a Counterbalanced Experimental Design , 1999, Journal of abnormal child psychology.

[16]  Shekhar Saxena,et al.  On the development and psychometric testing of the WHO screening instrument to assess disablement in the general population , 1999 .

[17]  A. Erkanli,et al.  Impaired but undiagnosed. , 1999, Journal of the American Academy of Child and Adolescent Psychiatry.

[18]  R. Kessler The World Health Organization International Consorthm in Psychiatric Epidemiology (ICPE): initial work and future directions ‐ the NAPE Lecture 1998 a , 1999 .

[19]  G. Canino,et al.  Features of Interview Questions Associated with Attenuation of Symptom Reports , 1999, Journal of abnormal child psychology.

[20]  S. Chatterji,et al.  Limitations of diagnostic paradigm: it doesn't explain "need". , 1998, Archives of general psychiatry.

[21]  H. Pincus,et al.  "Clinical significance" and DSM-IV. , 1998, Archives of general psychiatry.

[22]  R. Kessler,et al.  Methodological studies of the Composite International Diagnostic Interview (CIDI) in the US national comorbidity survey (NCS) , 1998 .

[23]  D S Rae,et al.  Limitations of diagnostic criteria and assessment instruments for mental disorders. Implications for research and policy. , 1998, Archives of general psychiatry.

[24]  M K Shear,et al.  Multicenter collaborative panic disorder severity scale. , 1997, The American journal of psychiatry.

[25]  P. Lewinsohn,et al.  Comparability of telephone and face-to-face interviews in assessing axis I and II disorders. , 1997, The American journal of psychiatry.

[26]  A. Rush,et al.  The Inventory of Depressive Symptomatology (IDS): psychometric properties , 1996, Psychological Medicine.

[27]  M. First,et al.  Structured clinical interview for DSM-IV axis I disorders : SCID-I: clinical version : administration booklet , 1996 .

[28]  E. Bromet,et al.  Best-estimate versus structured interview-based diagnosis in first-admission psychosis. , 1994, Comprehensive psychiatry.

[29]  S. Faraone,et al.  Measuring diagnostic accuracy in the absence of a "gold standard". , 1994, The American journal of psychiatry.

[30]  H. Wittchen Reliability and validity studies of the WHO--Composite International Diagnostic Interview (CIDI): a critical review. , 1994, Journal of psychiatric research.

[31]  R. Kessler,et al.  Lifetime and 12-month prevalence of DSM-III-R psychiatric disorders in the United States. Results from the National Comorbidity Survey. , 1994, Archives of general psychiatry.

[32]  M. Weissman,et al.  Diagnostic interviewing for family studies: Comparing telephone and face-to-face methods for the diagnosis of lifetime psychiatric disorders , 1993 .

[33]  J. Carlin,et al.  Bias, prevalence and kappa. , 1993, Journal of clinical epidemiology.

[34]  F. Goodwin,et al.  Health care reform for Americans with severe mental illnesses: report of the National Advisory Mental Health Council. , 1993, The American journal of psychiatry.

[35]  M Davies,et al.  The Structured Clinical Interview for DSM-III-R (SCID). II. Multisite test-retest reliability. , 1992 .

[36]  K. Bucholz,et al.  Comparison of Composite International Diagnostic Interview and clinical DSM‐III‐R criteria checklist diagnoses , 1992, Acta psychiatrica Scandinavica.

[37]  R. Kessler,et al.  A population-based twin study of major depression in women. The impact of varying definitions of illness. , 1992, Archives of general psychiatry.

[38]  R. Bland Psychiatric Disorders in America: The Epidemiologic Catchment Area Study , 1992 .

[39]  A. Feinstein,et al.  High agreement but low kappa: I. The problems of two paradoxes. , 1990, Journal of clinical epidemiology.

[40]  A. Agresti An introduction to categorical data analysis , 1997 .

[41]  J. Gorman,et al.  Reliability of anxiety assessment. I. Diagnostic agreement. , 1989, Archives of general psychiatry.

[42]  P E Shrout,et al.  Design of two-phase prevalence surveys of rare disorders. , 1989, Biometrics.

[43]  L. Robins Diagnostic grammar and assessment: translating criteria into questions , 1989, Psychological Medicine.

[44]  A. Farmer,et al.  The Composite International Diagnostic Interview. An epidemiologic Instrument suitable for use in conjunction with different diagnostic systems and in different cultures. , 1988, Archives of general psychiatry.

[45]  E. Spitznagel,et al.  The predictive validity of lay Diagnostic Interview Schedule diagnoses in the general population. A comparison with physician examiners. , 1987, Archives of general psychiatry.

[46]  L. Crocker,et al.  Introduction to Classical and Modern Test Theory , 1986 .

[47]  M. Dew,et al.  Long-term reliability of diagnosing lifetime major depression in a community sample. , 1986, Archives of general psychiatry.

[48]  L. Robins Epidemiology: reflections on testing the validity of psychiatric interviews. , 1985, Archives of general psychiatry.

[49]  E. Spitznagel,et al.  A proposed solution to the base rate problem in the kappa statistic. , 1985, Archives of general psychiatry.

[50]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[51]  J. Endicott,et al.  Mastering the art of research interviewing. A model training procedure for diagnostic evaluation. , 1981, Archives of general psychiatry.

[52]  L. Robins,et al.  National Institute of Mental Health Diagnostic Interview Schedule. Its history, characteristics, and validity. , 1981, Archives of general psychiatry.

[53]  L. Kish,et al.  Inference from Complex Samples , 1974 .

[54]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .