Overinterpretation and misreporting of diagnostic accuracy studies: evidence of "spin".

PURPOSE To estimate the frequency of distorted presentation and overinterpretation of results in diagnostic accuracy studies. MATERIALS AND METHODS MEDLINE was searched for diagnostic accuracy studies published between January and June 2010 in journals with an impact factor of 4 or higher. Articles included were primary studies of the accuracy of one or more tests in which the results were compared with a clinical reference standard. Two authors scored each article independently by using a pretested data-extraction form to identify actual overinterpretation and practices that facilitate overinterpretation, such as incomplete reporting of study methods or the use of inappropriate methods (potential overinterpretation). The frequency of overinterpretation was estimated in all studies and in a subgroup of imaging studies. RESULTS Of the 126 articles, 39 (31%; 95% confidence interval [CI]: 23, 39) contained a form of actual overinterpretation, including 29 (23%; 95% CI: 16, 30) with an overly optimistic abstract, 10 (8%; 96% CI: 3%, 13%) with a discrepancy between the study aim and conclusion, and eight with conclusions based on selected subgroups. In our analysis of potential overinterpretation, authors of 89% (95% CI: 83%, 94%) of the studies did not include a sample size calculation, 88% (95% CI: 82%, 94%) did not state a test hypothesis, and 57% (95% CI: 48%, 66%) did not report CIs of accuracy measurements. In 43% (95% CI: 34%, 52%) of studies, authors were unclear about the intended role of the test, and in 3% (95% CI: 0%, 6%) they used inappropriate statistical tests. A subgroup analysis of imaging studies showed 16 (30%; 95% CI: 17%, 43%) and 53 (100%; 95% CI: 92%, 100%) contained forms of actual and potential overinterpretation, respectively. CONCLUSION Overinterpretation and misreporting of results in diagnostic accuracy studies is frequent in journals with high impact factors. SUPPLEMENTAL MATERIAL http://radiology.rsna.org/lookup/suppl/doi:10.1148/radiol.12120527/-/DC1.

[1]  Douglas G. Altman DSc Why We Need Confidence Intervals , 2005, World Journal of Surgery.

[2]  E. S. Pearson,et al.  THE USE OF CONFIDENCE OR FIDUCIAL LIMITS ILLUSTRATED IN THE CASE OF THE BINOMIAL , 1934 .

[3]  Gordon H Guyatt,et al.  Users' guide to detecting misleading claims in clinical research reports , 2004, BMJ : British Medical Journal.

[4]  I. Scott,et al.  Cautionary tales in the clinical interpretation of studies of diagnostic tests , 2008, Internal medicine journal.

[5]  J. André Knottnerus,et al.  Analysis of Data on the Accuracy of Diagnostic Tests , 2009 .

[6]  T. Davis,et al.  Diagnosis of histoplasmosis by antigen detection in BAL fluid. , 2010, Chest.

[7]  Douglas G. Altman,et al.  Why We Need Confidence Intervals , 2005, World journal of surgery.

[8]  David Moher,et al.  The STARD Statement for Reporting Studies of Diagnostic Accuracy: Explanation and Elaboration , 2003, Annals of Internal Medicine [serial online].

[9]  P. Bossuyt STARD statement: still room for improvement in the reporting of diagnostic accuracy studies. , 2008, Radiology.

[10]  N. Wilczynski Quality of reporting of diagnostic accuracy studies: no change since STARD statement publication--before-and-after study. , 2008, Radiology.

[11]  Stephen W Lagakos,et al.  Statistics in medicine--reporting of subgroup analyses in clinical trials. , 2007, The New England journal of medicine.

[12]  C. Gatsonis,et al.  Designing studies to ensure that estimates of test accuracy are transferable , 2002, BMJ : British Medical Journal.

[13]  C. Marco,et al.  Research ethics: ethical issues of data reporting and the quest for authenticity. , 2000, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[14]  P. Bossuyt The thin line between hope and hype in biomarker research. , 2011, JAMA.

[15]  I Chalmers,et al.  Underreporting research is scientific misconduct. , 1990, JAMA.

[16]  Patrick M. M. Bossuyt,et al.  Designing studies to ensure that estimates of test accuracy will travel , 2009 .

[17]  Douglas G Altman,et al.  Reporting and interpretation of randomized controlled trials with statistically nonsignificant results for primary outcomes. , 2010, JAMA.

[18]  J. Fritz,et al.  Examining diagnostic tests: an evidence-based perspective. , 2001, Physical therapy.

[19]  Ben Ewald,et al.  Post hoc choice of cut points introduced bias to diagnostic research. , 2006, Journal of clinical epidemiology.

[20]  P D Bezemer,et al.  Publications on diagnostic test evaluation in family medicine journals: an optimal search strategy. , 2000, Journal of clinical epidemiology.

[21]  R. Fletcher,et al.  "Spin" in scientific writing: scientific mischief and legal jeopardy. , 2007, Medicine and law.

[22]  M. Dinis-Ribeiro,et al.  Quality reporting of endoscopic diagnostic studies in gastrointestinal journals: where do we stand on the use of the STARD and CONSORT statements? , 2010, Endoscopy.

[23]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[24]  A. Zinsmeister,et al.  Ten Common Statistical Errors and How to Avoid Them , 2008, American Journal of Gastroenterology.

[25]  D. Rennie,et al.  Towards complete and accurate reporting of studies of diagnostic accuracy: the STARD initiative , 2003, Annals of Internal Medicine.

[26]  R. Horton The rhetoric of research , 1995, BMJ.

[27]  N. Dendukuri,et al.  Quality and Reporting of Diagnostic Accuracy Studies in TB, HIV and Malaria: Evaluation Using QUADAS and STARD Standards , 2009, PloS one.

[28]  D. Altman,et al.  Reporting of effect direction and size in abstracts of systematic reviews. , 2011, JAMA.

[29]  J. Knottnerus,et al.  Assessment of the accuracy of diagnostic tests: the cross-sectional study. , 2003, Journal of clinical epidemiology.

[30]  Andrew Hayen,et al.  Appropriate statistical methods are required to assess diagnostic tests for replacement, add-on, and triage. , 2010, Journal of clinical epidemiology.

[31]  R. Morris,et al.  The quality of reporting of primary test accuracy studies in obstetrics and gynaecology: application of the STARD criteria , 2011, BMC women's health.

[32]  M. Leeflang,et al.  Bias in sensitivity and specificity caused by data-driven selection of optimal cutoff values: mechanisms, magnitude, and solutions. , 2008, Clinical chemistry.

[33]  S. Lagakos The challenge of subgroup analyses--reporting without distorting. , 2006, The New England journal of medicine.

[34]  Robert Harper,et al.  Reporting of precision of estimates for diagnostic accuracy: a review , 1999, BMJ.

[35]  Johannes B Reitsma,et al.  Quality of reporting of diagnostic accuracy studies. , 2005, Radiology.

[36]  A. Azuara-Blanco,et al.  The Quality of Reporting of Diagnostic Accuracy Studies in Glaucoma Using Scanning Laser Polarimetry , 2007, Journal of glaucoma.

[37]  A. E. Yeo,et al.  Performance of Detecting IgM Antibodies against Enterovirus 71 for Early Diagnosis , 2010, PloS one.

[38]  Holly Janes,et al.  Pivotal Evaluation of the Accuracy of a Biomarker Used for Classification or Prediction: Standards for Study Design , 2008, Journal of the National Cancer Institute.

[39]  J. Ioannidis,et al.  Comparison of effect sizes associated with biomarkers reported in highly cited individual articles and in subsequent meta-analyses. , 2011, JAMA.

[40]  R M Pitkin,et al.  Accuracy of data in abstracts of published research articles. , 1999, JAMA.

[41]  Paul Glasziou,et al.  Comparative accuracy: assessing new tests against existing diagnostic pathways , 2006, BMJ : British Medical Journal.

[42]  J. Ioannidis,et al.  Why Current Publication Practices May Distort Science , 2008, PLoS medicine.

[43]  John P A Ioannidis,et al.  Overinterpretation of clinical applicability in molecular diagnostic research. , 2009, Clinical chemistry.

[44]  Paul Cumming,et al.  The Value of the Dopamine D2/3 Receptor Ligand 18F-Desmethoxyfallypride for the Differentiation of Idiopathic and Nonidiopathic Parkinsonian Syndromes , 2010, Journal of Nuclear Medicine.