Private Causal Inference using Propensity Scores

The use of propensity score methods to reduce selection bias when determining causal effects is common practice for observational studies. Although such studies in econometrics, social science, and medicine often rely on sensitive data, there has been no prior work on privatising the propensity scores used to ascertain causal effects from observed data. In this paper, we demonstrate how to privatise the propensity score and quantify how the added noise for privatisation affects the propensity score as well as subsequent causal inference. We test our methods on both simulated and real-world datasets. The results are consistent with our theoretical findings that the privatisation preserves the validity of subsequent causal analysis with high probability. More importantly, our results empirically demonstrate that the proposed solutions are practical for moderately-sized datasets.

[1]  Xintao Wu,et al.  Differential Privacy Preserving Causal Graph Discovery , 2017, 2017 IEEE Symposium on Privacy-Aware Computing (PAC).

[2]  Uri Shalit,et al.  Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.

[3]  Gábor Lugosi,et al.  Concentration Inequalities - A Nonasymptotic Theory of Independence , 2013, Concentration Inequalities.

[4]  C. Glymour,et al.  STATISTICS AND CAUSAL INFERENCE , 1985 .

[5]  S. Goodman,et al.  Causal inference in public health. , 2013, Annual review of public health.

[6]  P. Rosenbaum Model-Based Direct Adjustment , 1987 .

[7]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[8]  Thorsten Joachims,et al.  Counterfactual Risk Minimization: Learning from Logged Bandit Feedback , 2015, ICML.

[9]  Peter Tankov,et al.  Tail behavior of sums and differences of log-normal random variables , 2016 .

[10]  M Soledad Cepeda,et al.  Comparison of logistic regression versus propensity score when the number of events is low and there are multiple confounders. , 2003, American journal of epidemiology.

[11]  Aaron Roth,et al.  The Algorithmic Foundations of Differential Privacy , 2014, Found. Trends Theor. Comput. Sci..

[12]  Richard M Shiffrin,et al.  Drawing causal inference from Big Data , 2016, Proceedings of the National Academy of Sciences.

[13]  Jeremy A Rassen,et al.  Privacy-Maintaining Propensity Score-Based Pooling of Multiple Databases Applied to a Study of Biologics , 2010, Medical care.

[14]  Anand D. Sarwate,et al.  Differentially Private Empirical Risk Minimization , 2009, J. Mach. Learn. Res..

[15]  Jennifer L. Hill,et al.  Bayesian Nonparametric Modeling for Causal Inference , 2011 .

[16]  Anand D. Sarwate,et al.  Signal Processing and Machine Learning with Differential Privacy: Algorithms and Challenges for Continuous Data , 2013, IEEE Signal Processing Magazine.

[17]  D. Rubin,et al.  Constructing a Control Group Using Multivariate Matched Sampling Methods That Incorporate the Propensity Score , 1985 .

[18]  J. Robins,et al.  Double/Debiased Machine Learning for Treatment and Structural Parameters , 2017 .

[19]  P. Austin An Introduction to Propensity Score Methods for Reducing the Effects of Confounding in Observational Studies , 2011, Multivariate behavioral research.

[20]  J. Robins,et al.  Marginal Structural Models and Causal Inference in Epidemiology , 2000, Epidemiology.

[21]  C. Blumberg Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction , 2016 .

[22]  D. Rubin Causal Inference Using Potential Outcomes , 2005 .

[23]  Matt J. Kusner,et al.  Private Causal Inference , 2015, AISTATS.

[24]  Elias Bareinboim,et al.  Causal inference and the data-fusion problem , 2016, Proceedings of the National Academy of Sciences.

[25]  D. Rubin,et al.  Reducing Bias in Observational Studies Using Subclassification on the Propensity Score , 1984 .

[26]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[27]  Edward Chi-Fai Lo The Sum and Difference of Two Lognormal Random Variables , 2013, J. Appl. Math..

[28]  R. Lalonde Evaluating the Econometric Evaluations of Training Programs with Experimental Data , 1984 .

[29]  Moni Naor,et al.  Our Data, Ourselves: Privacy Via Distributed Noise Generation , 2006, EUROCRYPT.

[30]  D. Rubin Estimating causal effects of treatments in randomized and nonrandomized studies. , 1974 .