Selective Cutoff Reporting in Studies of Diagnostic Test Accuracy: A Comparison of Conventional and Individual-Patient-Data Meta-Analyses of the Patient Health Questionnaire-9 Depression Screening Tool

In studies of diagnostic test accuracy, authors sometimes report results only for a range of cutoff points around data-driven "optimal" cutoffs. We assessed selective cutoff reporting in studies of the diagnostic accuracy of the Patient Health Questionnaire-9 (PHQ-9) depression screening tool. We compared conventional meta-analysis of published results only with individual-patient-data meta-analysis of results derived from all cutoff points, using data from 13 of 16 studies published during 2004-2009 that were included in a published conventional meta-analysis. For the "standard" PHQ-9 cutoff of 10, accuracy results had been published by 11 of the studies. For all other relevant cutoffs, 3-6 studies published accuracy results. For all cutoffs examined, specificity estimates in conventional and individual-patient-data meta-analyses were within 1% of each other. Sensitivity estimates were similar for the cutoff of 10 but differed by 5%-15% for other cutoffs. In samples where the PHQ-9 was poorly sensitive at the standard cutoff, authors tended to report results for lower cutoffs that yielded optimal results. When the PHQ-9 was highly sensitive, authors more often reported results for higher cutoffs. Consequently, in the conventional meta-analysis, sensitivity increased as cutoff severity increased across part of the cutoff range-an impossibility if all data are analyzed. In sum, selective reporting by primary study authors of only results from cutoffs that perform well in their study can bias accuracy estimates in meta-analyses of published results.

[1]  J. Higgins,et al.  Cochrane Handbook for Systematic Reviews of Interventions , 2010, International Coaching Psychology Review.

[2]  Gerta Rücker,et al.  Modelling multiple thresholds in meta-analysis of diagnostic test accuracy studies , 2016, BMC Medical Research Methodology.

[3]  M. Phipps,et al.  Screening for Depression in Adults: US Preventive Services Task Force Recommendation Statement. , 2016, JAMA.

[4]  Richard D Riley,et al.  Meta-analysis of test accuracy studies: an exploratory method for investigating the impact of missing thresholds , 2015, Systematic Reviews.

[5]  Lorie A. Kloda,et al.  The diagnostic accuracy of the Patient Health Questionnaire-2 (PHQ-2), Patient Health Questionnaire-8 (PHQ-8), and Patient Health Questionnaire-9 (PHQ-9) for detecting major depression: protocol for a systematic review and individual patient data meta-analyses , 2014, Systematic Reviews.

[6]  R. Kravitz,et al.  Potential Antidepressant Overtreatment Associated with Office Use of Brief Depression Symptom Measures , 2014, The Journal of the American Board of Family Medicine.

[7]  Richard D Riley,et al.  Meta-analysis of test accuracy studies with multiple and missing thresholds : a multivariate-normal model , 2014 .

[8]  B. Thombs,et al.  Does depression screening improve depression outcomes in primary care? , 2014, BMJ : British Medical Journal.

[9]  Lorie A. Kloda,et al.  There are no randomized controlled trials that support the United States Preventive Services Task Force guideline on screening for depression in primary care: a systematic review , 2014, BMC Medicine.

[10]  A. Thiébaut,et al.  HPV genotype replacement: too early to tell. , 2013, The Lancet. Infectious diseases.

[11]  M. Osthoff,et al.  Procalcitonin as a diagnostic marker for sepsis. , 2013, The Lancet. Infectious diseases.

[12]  C. Gamble,et al.  Systematic Review of the Empirical Evidence of Study Publication Bias and Outcome Reporting Bias — An Updated Review , 2013, PloS one.

[13]  K. Pottie,et al.  Recommendations on screening for depression in adults , 2013, Canadian Medical Association Journal.

[14]  N. Meader,et al.  Screening for poststroke major depression: a meta-analysis of diagnostic validity studies , 2013, Journal of Neurology, Neurosurgery & Psychiatry.

[15]  N. Meader,et al.  Meta-analysis of screening and case finding tools for depression in cancer: evidence based recommendations for clinical practice on behalf of the Depression in Cancer Care consensus group. , 2012, Journal of affective disorders.

[16]  N. Anderson,et al.  Diagnosis and treatment of depression following routine screening in patients with coronary heart disease or diabetes: a database cohort study , 2012, Psychological Medicine.

[17]  J. Doust,et al.  Preventing overdiagnosis: how to stop harming the healthy , 2012, BMJ : British Medical Journal.

[18]  Blair T. Johnson,et al.  Rethinking recommendations for screening for depression in primary care , 2012, Canadian Medical Association Journal.

[19]  Laura Manea,et al.  Optimal cut-off score for diagnosing depression with the Patient Health Questionnaire (PHQ-9): a meta-analysis , 2012, Canadian Medical Association Journal.

[20]  C. Chew‐Graham,et al.  Case identification of depression in patients with chronic physical health problems: a diagnostic accuracy meta-analysis of 113 studies. , 2011, The British journal of general practice : the journal of the Royal College of General Practitioners.

[21]  Susan Mallett,et al.  QUADAS-2: A Revised Tool for the Quality Assessment of Diagnostic Accuracy Studies , 2011, Annals of Internal Medicine.

[22]  R. Steele,et al.  Risk of bias from inclusion of patients who already have diagnosis of or are undergoing treatment for depression in diagnostic accuracy studies of screening tools for depression: systematic review , 2011, BMJ : British Medical Journal.

[23]  Douglas G Altman,et al.  Comparison of protocols and registry entries to published reports for randomised controlled trials. , 2011, The Cochrane database of systematic reviews.

[24]  D. Moher,et al.  Chapter 10: Addressing reporting biases , 2011 .

[25]  Gerta Rücker,et al.  Summary ROC curve based on a weighted Youden index for selecting an optimal cutpoint in meta‐analysis of diagnostic accuracy , 2010, Statistics in medicine.

[26]  Paul Symonds,et al.  Diagnostic validity of the Hospital Anxiety and Depression Scale (HADS) in cancer and palliative settings: a meta-analysis. , 2010, Journal of affective disorders.

[27]  Niels Smits,et al.  A note on Youden's J and its cost ratio , 2010, BMC medical research methodology.

[28]  Douglas G Altman,et al.  The impact of outcome reporting bias in randomised controlled trials on a cohort of systematic reviews , 2010, BMJ : British Medical Journal.

[29]  R. Riley,et al.  Meta-analysis of individual participant data: rationale, conduct, and reporting , 2010, BMJ : British Medical Journal.

[30]  H Putter,et al.  Meta‐Analysis of Diagnostic Test Accuracy Studies with Multiple Thresholds using Survival Methods , 2009, Biometrical journal. Biometrische Zeitschrift.

[31]  Theo Stijnen,et al.  Multivariate random effects meta-analysis of diagnostic tests with multiple thresholds , 2009, BMC medical research methodology.

[32]  A. Schene,et al.  The accuracy of Patient Health Questionnaire-9 in detecting depression and measuring depression severity in high-risk groups in primary care. , 2009, General hospital psychiatry.

[33]  J. Crippa,et al.  Study of the discriminative validity of the PHQ-9 and PHQ-2 in a sample of Brazilian women in the context of primary health care. , 2009, Perspectives in psychiatric care.

[34]  S. Crow,et al.  Postpartum Depression Screening at Well-Child Visits: Validity of a 2-Question Screen and the PHQ-9 , 2009, The Annals of Family Medicine.

[35]  Richard D Riley,et al.  Meta‐analysis of diagnostic test studies using individual patient data and aggregate data , 2008, Statistics in medicine.

[36]  B. Thombs,et al.  Optimizing Detection of Major Depression Among Patients with Coronary Artery Disease Using the Patient Health Questionnaire: Data from the Heart and Soul Study , 2008, Journal of General Internal Medicine.

[37]  J. Knottnerus,et al.  Summed score of the Patient Health Questionnaire-9 was a reliable and valid method for depression screening in chronically ill elderly patients. , 2008, Journal of clinical epidemiology.

[38]  M. Lotrakul,et al.  Reliability and validity of the Thai version of the PHQ-9 , 2008, BMC psychiatry.

[39]  M. Leeflang,et al.  Bias in sensitivity and specificity caused by data-driven selection of optimal cutoff values: mechanisms, magnitude, and solutions. , 2008, Clinical chemistry.

[40]  M. Fava,et al.  Validation of the Patient Health Questionnaire-9 for depression screening among Chinese Americans. , 2008, Comprehensive psychiatry.

[41]  Erik Meijer,et al.  Resampling Multilevel Models , 2008 .

[42]  Harvey Goldstein,et al.  Handbook of multilevel analysis , 2008 .

[43]  C. Hewitt,et al.  Screening for Depression in Medical Settings with the Patient Health Questionnaire (PHQ): A Diagnostic Meta-Analysis , 2007, Journal of General Internal Medicine.

[44]  M. Berk,et al.  Validity of the Hospital Anxiety and Depression Scale and Patient Health Questionnaire-9 to screen for depression in patients with coronary artery disease. , 2007, General hospital psychiatry.

[45]  A. Schene,et al.  Diagnostic accuracy of the mood module of the Patient Health Questionnaire: a systematic review. , 2007, General hospital psychiatry.

[46]  M. Barkham,et al.  Diagnosing depression in primary care using self-completed instruments: UK validation of PHQ-9 and CORE-OM. , 2007, The British journal of general practice : the journal of the Royal College of General Practitioners.

[47]  O. O. Afolabi,et al.  Validity of the patient health questionnaire (PHQ-9) as a screening tool for depression amongst Nigerian university students. , 2006, Journal of affective disorders.

[48]  Ben Ewald,et al.  Post hoc choice of cut points introduced bias to diagnostic research. , 2006, Journal of clinical epidemiology.

[49]  Simon Gilbody,et al.  Should we screen for depression? , 2006, BMJ : British Medical Journal.

[50]  M. M. Shah,et al.  Validation of the Malay Version Brief Patient Health Questionnaire (PHQ-9) among Adult Attending Family Medicine Clinics. , 2005 .

[51]  S. Dikmen,et al.  Validity of the Patient Health Questionnaire‐9 in Assessing Depression Following Traumatic Brain Injury , 2005, The Journal of head trauma rehabilitation.

[52]  Suzanne G. Watnick,et al.  Validation of 2 depression screening tools in dialysis patients. , 2005, American journal of kidney diseases : the official journal of the National Kidney Foundation.

[53]  A J Sutton,et al.  Variation in the diagnostic performance of D-dimer for suspected deep vein thrombosis. , 2005, QJM : monthly journal of the Association of Physicians.

[54]  John Hoey,et al.  Clinical trial registration: a statement from the International Committee of Medical Journal Editors. , 2005, Circulation.

[55]  Wanzhu Tu,et al.  Performance of the PHQ-9 as a Screening Tool for Depression After Stroke , 2005, Stroke.

[56]  S. Zipfel,et al.  Screening psychischer Störungen mit dem "Gesundheitsfragebogen für Patienten (PHQ-D)" Ergebnisse der deutschen Validierungsstudie , 2004 .

[57]  John Hoey,et al.  Clinical trial registration: a statement from the International Committee of Medical Journal Editors. , 2004, JAMA.

[58]  J. Coyne,et al.  Screening for depression in medical care: pitfalls, alternatives, and revised priorities. , 2003, Journal of psychosomatic research.

[59]  R. Spitzer,et al.  The PHQ-9: A new depression diagnostic and severity measure , 2002 .

[60]  S. Thompson,et al.  Quantifying heterogeneity in a meta‐analysis , 2002, Statistics in medicine.

[61]  R. Spitzer,et al.  The PHQ-9: validity of a brief depression severity measure. , 2001, Journal of general internal medicine.

[62]  R. Spitzer,et al.  Validation and utility of a self-report version of PRIME-MD: the PHQ primary care study. Primary Care Evaluation of Mental Disorders. Patient Health Questionnaire. , 1999, JAMA.

[63]  G. Huston The Hospital Anxiety and Depression Scale. , 1987, The Journal of rheumatology.

[64]  R. Snaith,et al.  The hospital anxiety and depression scale. , 2013, Acta psychiatrica Scandinavica.

[65]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.