Improving reproducibility by using high-throughput observational studies with empirical calibration

Concerns over reproducibility in science extend to research using existing healthcare data; many observational studies investigating the same topic produce conflicting results, even when using the same data. To address this problem, we propose a paradigm shift. The current paradigm centres on generating one estimate at a time using a unique study design with unknown reliability and publishing (or not) one estimate at a time. The new paradigm advocates for high-throughput observational studies using consistent and standardized methods, allowing evaluation, calibration and unbiased dissemination to generate a more reliable and complete evidence base. We demonstrate this new paradigm by comparing all depression treatments for a set of outcomes, producing 17 718 hazard ratios, each using methodology on par with current best practice. We furthermore include control hypotheses to evaluate and calibrate our evidence generation process. Results show good transitivity and consistency between databases, and agree with four out of the five findings from clinical trials. The distribution of effect size estimates reported in the literature reveals an absence of small or null effects, with a sharp cut-off at p = 0.05. No such phenomena were observed in our results, suggesting more complete and more reliable evidence. This article is part of a discussion meeting issue ‘The growing ubiquity of algorithms in society: implications, impacts and innovations’.

[1]  Liam J. Murray,et al.  Exposure to oral bisphosphonates and risk of esophageal cancer. , 2010, JAMA.

[2]  George Hripcsak,et al.  Comprehensive comparative effectiveness and safety of first-line antihypertensive drug classes: a systematic, multinational, large-scale analysis , 2019, The Lancet.

[3]  Rachel Churchill,et al.  Mirtazapine versus other antidepressive agents for depression. , 2010, The Cochrane database of systematic reviews.

[4]  Eduardo L. Franco,et al.  Making Prospective Registration of Observational Research a Reality , 2014, Science Translational Medicine.

[5]  E. A. Gardner,et al.  Improvement in fluoxetine-associated sexual dysfunction in patients switched to bupropion. , 1993, The Journal of clinical psychiatry.

[6]  Marc L. Berger,et al.  Utilization of Positive and Negative Controls to Examine Comorbid Associations in Observational Database Studies , 2017, Medical care.

[7]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[8]  H Jick,et al.  HMG-CoA reductase inhibitors and the risk of fractures. , 2000, JAMA.

[9]  Rae Woong Park,et al.  Characterizing treatment pathways at scale using the OHDSI network , 2016, Proceedings of the National Academy of Sciences.

[10]  M. Baker 1,500 scientists lift the lid on reproducibility , 2016, Nature.

[11]  K. Jenpanich,et al.  [Drug administration]. , 1976, Thai journal of nursing.

[12]  Marius Fieschi,et al.  Design and validation of an automated method to detect known adverse drug reactions in MEDLINE: a contribution from the EU-ADR project , 2013, J. Am. Medical Informatics Assoc..

[13]  Jonathan D Wren,et al.  Examination of CIs in health and medical journals from 1976 to 2019: an observational study , 2019, BMJ Open.

[14]  Martijn J Schuemie,et al.  Incidence of diabetic ketoacidosis among patients with type 2 diabetes mellitus treated with SGLT2 inhibitors and other antihyperglycemic agents. , 2017, Diabetes research and clinical practice.

[15]  P W Wilson,et al.  Postmenopausal estrogen use, cigarette smoking, and cardiovascular morbidity in women over 50. The Framingham Study. , 1985, The New England journal of medicine.

[16]  Chin-Hsien Lin,et al.  Comparison of the effects of serotonin-norepinephrine reuptake inhibitors versus selective serotonin reuptake inhibitors on cerebrovascular events. , 2016, The Journal of clinical psychiatry.

[17]  Martijn J Schuemie,et al.  Atypical Antipsychotics and the Risk of Falls and Fractures Among Older Adults: An Emulation Analysis and an Evaluation of Additional Confounding Control Strategies , 2017, Journal of clinical psychopharmacology.

[18]  正博 頭金 FDA adverse event reporting system (FAERS) , 2015 .

[19]  John P. A. Ioannidis,et al.  A manifesto for reproducible science , 2017, Nature Human Behaviour.

[20]  Gabriela Czanner,et al.  Oral bisphosphonates and risk of cancer of oesophagus, stomach, and colorectum: case-control analysis within a UK primary care cohort , 2010, BMJ : British Medical Journal.

[21]  D. Madigan,et al.  Empirical confidence interval calibration for population-level effect estimation studies in observational healthcare data , 2018, Proceedings of the National Academy of Sciences.

[22]  G. Niklas Norén,et al.  Good Signal Detection Practices: Evidence from IMI PROTECT , 2016, Drug Safety.

[23]  A R Feinstein,et al.  A collection of 56 topics with contradictory results in case-control research. , 1988, International journal of epidemiology.

[24]  B Rosner,et al.  Postmenopausal estrogen and progestin use and the risk of cardiovascular disease. , 1996, The New England journal of medicine.

[25]  L. García-Rodríguez,et al.  Hormone replacement therapy and incidence of acute myocardial infarction. A population-based nested case-control study. , 2000, Circulation.

[26]  J. Ioannidis Why Most Published Research Findings Are False , 2005, PLoS medicine.

[27]  Rachel Churchill,et al.  Sertraline versus other antidepressive agents for depression. , 2009, The Cochrane database of systematic reviews.

[28]  Jon Duke,et al.  Consistency in the safety labeling of bioequivalent medications , 2013, Pharmacoepidemiology and drug safety.

[29]  Halil Kilicoglu,et al.  Constructing a semantic predication gold standard from the biomedical literature , 2011, BMC Bioinformatics.

[30]  M. Lindquist,et al.  Zoo or Savannah? Choice of Training Ground for Evidence-Based Pharmacovigilance , 2014, Drug Safety.

[31]  P. Sterzer,et al.  Born to be criminal? What to make of early biological risk factors for criminal behavior. , 2010, The American journal of psychiatry.

[32]  James M Robins,et al.  Using Big Data to Emulate a Target Trial When a Randomized Trial Is Not Available. , 2016, American journal of epidemiology.

[33]  C P Farrington,et al.  Relative incidence estimation from case series for vaccine safety evaluation. , 1995, Biometrics.

[34]  Patrick B. Ryan,et al.  Massive Parallelization of Serial Inference Algorithms for a Complex Generalized Linear Model , 2012, TOMC.

[35]  Mary Cushman,et al.  Estrogen plus progestin and the risk of coronary heart disease. , 2003, The New England journal of medicine.

[36]  Patrick B. Ryan,et al.  Accuracy of an automated knowledge base for identifying drug adverse reactions , 2017, J. Biomed. Informatics.

[37]  William DuMouchel,et al.  Interpreting observational studies: why empirical calibration is needed to correct p-values , 2013, Statistics in medicine.

[38]  Arnaud Doucet,et al.  Introduction to Special Issue on Monte Carlo Methods in Statistics , 2013, TOMC.

[39]  B. Arnold,et al.  Brief Report: Negative Controls to Detect Selection Bias and Measurement Bias in Epidemiologic Studies , 2016, Epidemiology.

[40]  D. Altman,et al.  Measuring inconsistency in meta-analyses , 2003, BMJ : British Medical Journal.

[41]  C. Cooper,et al.  Use of statins and risk of fractures. , 2001, JAMA.

[42]  George Hripcsak,et al.  Risk of angioedema associated with levetiracetam compared with phenytoin: Findings of the observational health data sciences and informatics research network , 2017, Epilepsia.

[43]  Jacob J Hughey,et al.  Discovering Cross-Reactivity in Urine Drug Screening Immunoassays through Large-Scale Analysis of Electronic Health Records. , 2019, Clinical chemistry.

[44]  Gerald Gartlehner,et al.  Drug Class Review: Second Generation Antidepressants: Final Report Update 4 , 2008 .

[45]  Gerald Gartlehner,et al.  Drug Class Review: Second-Generation Antidepressants , 2011 .

[46]  D. Madigan,et al.  Evaluating the impact of database heterogeneity on observational study results. , 2013, American journal of epidemiology.

[47]  Frank de Vries,et al.  Reanalysis of two studies with contrasting results on the association between statin use and fracture risk: the General Practice Research Database. , 2006, International journal of epidemiology.