A meta‐analytic framework to adjust for bias in external control studies

While randomized controlled trials (RCTs) are the gold standard for estimating treatment effects in medical research, there is increasing use of and interest in using real-world data for drug development. One such use case is the construction of external control arms for evaluation of efficacy in single-arm trials, particularly in cases where randomization is either infeasible or unethical. However, it is well known that treated patients in non-randomized studies may not be comparable to control patients—on either measured or unmeasured variables—and that the underlying population differences between the two groups may result in biased treatment effect estimates as well as increased variability in estimation. To address these challenges for analyses of time-to-event outcomes, we developed a meta-analytic framework that uses historical reference studies to adjust a log hazard ratio estimate in a new external control study for its additional bias and variability. The set of historical studies is formed by constructing external control arms for historical RCTs, and a meta-analysis compares the trial controls to the external control arms. Importantly, a prospective external control study can be performed independently of the meta-analysis using standard causal inference techniques for observational data. We illustrate our approach with a simulation study and an empirical example based on reference studies for advanced non-small cell lung cancer. In our empirical analysis, external control patients had lower survival than trial controls (hazard ratio: 0.907), but our methodology is able to correct for this bias. An implementation of our approach is available in the R package ecmeta. Keywords— External controls, meta-analysis, bias, survival analysis, RCTs 1 ar X iv :2 11 0. 03 82 7v 1 [ st at .M E ] 7 O ct 2 02 1

[1]  Geert Molenberghs,et al.  Evaluation of Surrogate Endpoints , 2006, Handbook of Statistical Methods for Randomized Controlled Trials.

[2]  V. Sambucini Comparison of Single-Arm vs. Randomized Phase II Clinical Trials: A Bayesian Approach , 2015, Journal of biopharmaceutical statistics.

[3]  James G. Scott,et al.  On the half-cauchy prior for a global scale parameter , 2011, 1104.4937.

[4]  C. Gerlinger,et al.  The use of external controls: To what extent can it currently be recommended? , 2021, Pharmaceutical statistics.

[5]  A. Gelman Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper) , 2004 .

[6]  D. Rubin,et al.  Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score , 1985 .

[7]  Peter C Austin,et al.  A critical appraisal of propensity‐score matching in the medical literature between 1996 and 2003 , 2008, Statistics in medicine.

[8]  S. Toh,et al.  Use of real‐world evidence in regulatory decisions for rare diseases in the United States—Current status and future directions , 2020, Pharmacoepidemiology and drug safety.

[9]  D. Rubin,et al.  Reducing Bias in Observational Studies Using Subclassification on the Propensity Score , 1984 .

[10]  S. Schneeweiss,et al.  Emulating Randomized Clinical Trials with Nonrandomized Real-World Evidence Studies: First Results from the RCT DUPLICATE Initiative. , 2020, Circulation.

[11]  Elizabeth A Stuart,et al.  Matching methods for causal inference: A review and a look forward. , 2010, Statistical science : a review journal of the Institute of Mathematical Statistics.

[12]  William B. Capra,et al.  Using Electronic Health Records to Derive Control Arms for Early Phase Single‐Arm Lung Cancer Trials: Proof‐of‐Concept in Randomized Controlled Trials , 2019, Clinical pharmacology and therapeutics.

[13]  D. Rubin Using Propensity Scores to Help Design Observational Studies: Application to the Tobacco Litigation , 2001, Health Services and Outcomes Research Methodology.

[14]  G. Collins,et al.  Double-adjustment in propensity score matching analysis: choosing a threshold for considering residual imbalance , 2017, BMC Medical Research Methodology.

[15]  R. Simon,et al.  The role of nonrandomized trials in the evaluation of oncology drugs , 2015, Clinical pharmacology and therapeutics.

[16]  D. Rubin For objective causal inference, design trumps analysis , 2008, 0811.1640.

[17]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[18]  M. Schemper,et al.  The estimation of average hazard ratios by weighted Cox regression , 2009, Statistics in medicine.

[19]  Joshua Haimson,et al.  Model-assisted cohort selection with bias analysis for generating large-scale cohorts from the EHR for oncology research , 2020, ArXiv.

[20]  Jay J H Park,et al.  Synthetic and External Controls in Clinical Trials – A Primer for Researchers , 2020, Clinical epidemiology.

[21]  Frank Bretz,et al.  Beyond Randomized Clinical Trials: Use of External Controls , 2019, Clinical pharmacology and therapeutics.

[22]  Daniel Farewell,et al.  Making apples from oranges: Comparing noncollapsible effect estimators and their standard errors after adjustment for different covariate sets , 2020, Biometrical journal. Biometrische Zeitschrift.

[23]  Aki Vehtari,et al.  Visualization in Bayesian workflow , 2017, Journal of the Royal Statistical Society: Series A (Statistics in Society).

[24]  S. Baxi,et al.  Comparison of Population Characteristics in Real-World Clinical Oncology Databases in the US: Flatiron Health, SEER, and NPCR , 2020 .

[25]  M J Daniels,et al.  Meta-analysis for the evaluation of potential surrogate markers. , 1997, Statistics in medicine.

[26]  Mark Lunt,et al.  Avoiding pitfalls when combining multiple imputation and propensity scores , 2019, Statistics in medicine.

[27]  M. A. Best Bayesian Approaches to Clinical Trials and Health‐Care Evaluation , 2005 .

[28]  Jasjeet S. Sekhon,et al.  Multivariate and Propensity Score Matching Software with Automated Balance Optimization: The Matching Package for R , 2008 .

[29]  E. Stuart,et al.  Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies , 2015, Statistics in medicine.

[30]  Yang Wang,et al.  Alternative Analysis Methods for Time to Event Endpoints Under Nonproportional Hazards: A Comparative Analysis , 2019 .

[31]  Gianluca Baio,et al.  Regulatory approval of pharmaceuticals without a randomised controlled study: analysis of EMA and FDA approvals 1999–2014 , 2016, BMJ Open.

[32]  P. Austin American Journal of Epidemiology Practice of Epidemiology Statistical Criteria for Selecting the Optimal Number of Untreated Subjects Matched to Each Treated Subject When Using Many-to-one Matching on the Propensity Score , 2022 .

[33]  P. Austin The use of propensity score methods with survival or time-to-event outcomes: reporting measures of effect similar to those used in randomized experiments , 2013, Statistics in medicine.

[34]  Ulrich Beyer,et al.  A multistate model for early decision‐making in oncology , 2018, Biometrical journal. Biometrische Zeitschrift.

[35]  M. Hernán,et al.  Why Test for Proportional Hazards? , 2020, JAMA.

[36]  A. Gajra,et al.  Use of Real-World Evidence to Support FDA Approval of Oncology Drugs. , 2020, Value in health : the journal of the International Society for Pharmacoeconomics and Outcomes Research.

[37]  J. Sekhon,et al.  Genetic Matching for Estimating Causal Effects: A General Multivariate Matching Method for Achieving Balance in Observational Studies , 2006, Review of Economics and Statistics.

[38]  G. Imbens,et al.  Large Sample Properties of Matching Estimators for Average Treatment Effects , 2004 .