Using Experimental Data to Evaluate Methods for Observational Causal Inference

Methods that infer causal dependence from observational data are central to many areas of science, including medicine, economics, and the social sciences. A variety of theoretical properties of these methods have been proven, but empirical evaluation remains a challenge, largely due to the lack of observational data sets for which treatment effect is known. We propose and analyze observational sampling from randomized controlled trials (OSRCT), a method for evaluating causal inference methods using data from randomized controlled trials (RCTs). This method can be used to create constructed observational data sets with corresponding unbiased estimates of treatment effect, substantially increasing the number of data sets available for evaluating causal inference methods. We show that, in expectation, OSRCT creates data sets that are equivalent to those produced by randomly sampling from empirical data sets in which all potential outcomes are available. We analyze several properties of OSRCT theoretically and empirically, and we demonstrate its use by comparing the performance of four causal inference methods using data from eleven RCTs.

[1]  Judea Pearl,et al.  The seven tools of causal inference, with reflections on machine learning , 2019, Commun. ACM.

[2]  Uri Shalit,et al.  Removing Hidden Confounding by Experimental Grounding , 2018, NeurIPS.

[3]  Keying Ye,et al.  Applied Bayesian Modeling and Causal Inference From Incomplete-Data Perspectives , 2005, Technometrics.

[4]  David E. Broockman,et al.  Do Politicians Racially Discriminate Against Constituents? A Field Experiment on State Legislators , 2011 .

[5]  Melissa R. Michelson,et al.  Emails from Official Sources Can Increase Turnout , 2012 .

[6]  Wei Chu,et al.  Unbiased offline evaluation of contextual-bandit-based news article recommendation algorithms , 2010, WSDM '11.

[7]  Max Welling,et al.  Causal Effect Inference with Deep Latent-Variable Models , 2017, NIPS 2017.

[8]  C. Ohmann,et al.  Evaluation of repositories for sharing individual-participant data from clinical studies , 2019, Trials.

[9]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[10]  Peter M. Steiner,et al.  Can Nonrandomized Experiments Yield Accurate Answers? A Randomized Experiment Comparing Random and Nonrandom Assignments , 2008 .

[11]  Jennifer L. Hill,et al.  Bayesian Nonparametric Modeling for Causal Inference , 2011 .

[12]  Matt J. Kusner,et al.  Counterfactual Fairness , 2017, NIPS.

[13]  Chen Yanover,et al.  Benchmarking Framework for Performance-Evaluation of Causal Inference Analysis , 2018, ArXiv.

[14]  R. Lalonde Evaluating the Econometric Evaluations of Training Programs with Experimental Data , 1984 .

[15]  Gregory A. Huber,et al.  Ballot Secrecy Concerns and Voter Mobilization , 2014 .

[16]  Nathan Kallus,et al.  Confounding-Robust Policy Improvement , 2018, NeurIPS.

[17]  D. Rubin Causal Inference Using Potential Outcomes , 2005 .

[18]  Harlan M. Krumholz,et al.  Individual Patient-Level Data Sharing for Continuous Learning: A Strategy for Trial Data Sharing. , 2019, NAM perspectives.

[19]  Jennifer Hill,et al.  Automated versus Do-It-Yourself Methods for Causal Inference: Lessons Learned from a Data Analysis Competition , 2017, Statistical Science.

[20]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[21]  Amanda Gentzel,et al.  The Case for Evaluating Causal Models Using Interventional Measures and Empirical Data , 2019, NeurIPS.

[22]  Bernhard Schölkopf,et al.  Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks , 2014, J. Mach. Learn. Res..

[23]  Thomas M. Norman,et al.  Perturb-Seq: Dissecting Molecular Circuits with Scalable Single-Cell RNA Profiling of Pooled Genetic Screens , 2016, Cell.

[24]  Madeleine Udell,et al.  Causal Inference with Noisy and Missing Covariates via Matrix Factorization , 2018, NeurIPS.

[25]  Olivier Nicol,et al.  Improving offline evaluation of contextual bandit algorithms via bootstrapping techniques , 2014, ICML.

[26]  E. Cebrian,et al.  Response diversity in Mediterranean coralligenous assemblages facing climate change: Insights from a multispecific thermotolerance experiment , 2019, Ecology and evolution.

[27]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[28]  Jeffrey M Drazen,et al.  Sharing individual patient data from clinical trials. , 2015, The New England journal of medicine.

[29]  Dipak Kalra,et al.  Sharing and reuse of individual participant data from clinical trials: principles and recommendations , 2017, BMJ Open.

[30]  Fiona Godlee,et al.  The new BMJ policy on sharing data from drug and device trials , 2012, BMJ : British Medical Journal.

[31]  Hedvig Kjellström,et al.  Neuropathic Pain Diagnosis Simulator for Causal Discovery Algorithm Evaluation , 2019, NeurIPS.