A Clinical Trial Derived Reference Set for Evaluating Observational Study Methods

Informing clinical practice in a learning health system requires good causal inference methods in order to estimate the effects of drugs from observational studies. However, these methods rely on untestable assumptions, making it difficult to evaluate whether the returned drug effect estimates from a method are reliable, or to compare the effectiveness of different methods. In this work, we provide a reference set of the effects of drugs on various side effects based on public clinical trials data. We improve on prior reference sets by constructing a consistent statistical definition of positive and negative controls and by constructing our controls from clinical trials instead of drug package inserts or the literature. We also provide an example application, where we use the reference set to evaluate a suite of causal inference methods on observational medical claims data. In doing so, we find that the treatment effects estimated using inverse propensity weighting with propensity scores estimated via machine learning accurately separate the positive controls from the negative controls in our reference set.

[1]  Peter C. Austin,et al.  Variance estimation when using inverse probability of treatment weighting (IPTW) with survival analysis , 2016, Statistics in medicine.

[2]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[3]  Wei Ma,et al.  RxNorm: prescription for electronic drug information exchange , 2005, IT Professional.

[5]  M Soledad Cepeda,et al.  Comparison of logistic regression versus propensity score when the number of events is low and there are multiple confounders. , 2003, American journal of epidemiology.

[6]  Paige L. Williams,et al.  Challenges in using observational studies to evaluate adverse effects of treatment. , 2007, The New England journal of medicine.

[7]  Alison Callahan,et al.  It is time to learn from patients like mine , 2019, npj Digital Medicine.

[8]  M. Schuemie,et al.  Defining a Reference Set to Support Methodological Research in Drug Safety , 2013, Drug Safety.

[9]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[10]  George Hripcsak,et al.  Comprehensive comparative effectiveness and safety of first-line antihypertensive drug classes: a systematic, multinational, large-scale analysis , 2019, The Lancet.

[11]  D. Matchar,et al.  Making Clinical Practice Guidelines Pragmatic: How Big Data and Real World Evidence Can Close the Gap. , 2018, Annals of the Academy of Medicine, Singapore.

[12]  Louise Bowman,et al.  The Magic of Randomization versus the Myth of Real-World Evidence. , 2020, The New England journal of medicine.

[13]  Amy P. Abernethy,et al.  Harnessing the Power of Real‐World Evidence (RWE): A Checklist to Ensure Regulatory‐Grade Data Quality , 2017, Clinical pharmacology and therapeutics.

[14]  Nigam H. Shah,et al.  Language Models Are An Effective Patient Representation Learning Technique For Electronic Health Record Data , 2020, ArXiv.

[15]  D. Moher,et al.  CONSORT 2010 statement: Updated guidelines for reporting parallel group randomised trials , 2010, Journal of pharmacology & pharmacotherapeutics.

[16]  Richard K. Crump,et al.  Moving the Goalposts: Addressing Limited Overlap in Estimation of Average Treatment Effects by Changing the Estimand , 2006, SSRN Electronic Journal.

[17]  Adrian Towse,et al.  Real-world evidence for coverage decisions: opportunities and challenges. , 2018, Journal of comparative effectiveness research.

[18]  Issa J Dahabreh,et al.  Can the learning health care system be educated with observational data? , 2014, JAMA.

[19]  David Madigan,et al.  Good practices for real‐world data studies of treatment and/or comparative effectiveness: Recommendations from the joint ISPOR‐ISPE Special Task Force on real‐world evidence in health care decision making , 2017, Pharmacoepidemiology and drug safety.

[20]  A. Krauss Why all randomised controlled trials produce biased results , 2018, Annals of medicine.

[21]  Martijn J. Schuemie,et al.  Replication of the OMOP Experiment in Europe: Evaluating Methods for Risk Identification in Electronic Health Record Databases , 2013, Drug Safety.

[22]  Martijn J. Schuemie,et al.  A Reference Standard for Evaluation of Methods for Drug Safety Signal Detection Using Electronic Healthcare Record Databases , 2012, Drug Safety.

[23]  Alison Callahan,et al.  Performing an Informatics Consult: Methods and Challenges. , 2018, Journal of the American College of Radiology : JACR.

[24]  Olivier Bodenreider,et al.  The Unified Medical Language System (UMLS): integrating biomedical terminology , 2004, Nucleic Acids Res..

[25]  G. Nicholas,et al.  Are clinical trial eligibility criteria an accurate reflection of a real-world population of advanced non-small-cell lung cancer patients? , 2018, Current oncology.

[26]  James M. Robins,et al.  Marginal Structural Models versus Structural nested Models as Tools for Causal inference , 2000 .

[27]  A. Nowacki,et al.  Understanding Equivalence and Noninferiority Testing , 2011, Journal of General Internal Medicine.

[28]  M. Fay Confidence intervals that match Fisher's exact or Blaker's exact tests. , 2010, Biostatistics.

[29]  Charles E. Leonard,et al.  Comment on: “Desideratum for Evidence-Based Epidemiology” , 2014, Drug Safety.

[30]  S. Bacon,et al.  Compliance with legal requirement to report clinical trial results on ClinicalTrials.gov: a cohort study , 2020, The Lancet.

[31]  George Hripcsak,et al.  How Confident Are We about Observational Findings in Healthcare: A Benchmark Study. , 2020, Harvard data science review.

[32]  O. Baser Choosing propensity score matching over regression adjustment for causal inference: when, why and how it makes sense , 2007 .