Federated Estimation of Causal Effects from Observational Data

Many modern applications collect data that comes in federated spirit, with data kept locally and undisclosed. Till date, most insight into the causal inference requires data to be stored in a central repository. We present a novel framework for causal inference with federated data sources. We assess and integrate local causal effects from different private data sources without centralizing them. Then, the treatment effects on subjects from observational data using a non-parametric reformulation of the classical potential outcomes framework is estimated. We model the potential outcomes as a random function distributed by Gaussian processes, whose defining parameters can be efficiently learned from multiple data sources, respecting privacy constraints. We demonstrate the promise and efficiency of the proposed approach through a set of simulated and real-world benchmark examples.

[1]  Spyridon Bakas,et al.  Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data , 2020, Scientific Reports.

[2]  Aris Gkoulalas-Divanis,et al.  Predicting Adverse Drug Reactions on Distributed Health Data using Federated Learning , 2020, AMIA.

[3]  Jennifer L. Hill,et al.  Bayesian Nonparametric Modeling for Causal Inference , 2011 .

[4]  Elias Bareinboim,et al.  Meta-Transportability of Causal Effects: A Formal Approach , 2013, AISTATS.

[5]  Matt Taddy,et al.  Heterogeneous Treatment Effects in Digital Experimentation , 2014, 1412.8563.

[6]  Elias Bareinboim,et al.  Transportability of Causal and Statistical Relations: A Formal Approach , 2011, 2011 IEEE 11th International Conference on Data Mining Workshops.

[7]  Mengling Feng,et al.  Federated learning: a collaborative effort to achieve better medical imaging models for individual sites that have small labelled datasets. , 2021, Quantitative imaging in medicine and surgery.

[8]  D. Rubin ASSIGNMENT TO TREATMENT GROUP ON THE BASIS OF A COVARIATE , 1976 .

[9]  T. Speed,et al.  On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9 , 1990 .

[10]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[11]  Shandian Zhe,et al.  Scalable High-Order Gaussian Process Regression , 2019, AISTATS.

[12]  Uri Shalit,et al.  Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.

[13]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[14]  Soo-Yong Shin,et al.  Federated Learning on Clinical Benchmark Data: Performance Assessment , 2020, Journal of medical Internet research.

[15]  Mihaela van der Schaar,et al.  Time Series Deconfounder: Estimating Treatment Effects over Time in the Presence of Hidden Confounders , 2019, ICML.

[16]  Sören R. Künzel,et al.  Metalearners for estimating heterogeneous treatment effects using machine learning , 2017, Proceedings of the National Academy of Sciences.

[17]  Donald B. Rubin,et al.  Bayesian Inference for Causal Effects: The Role of Randomization , 1978 .

[18]  Tianjian Chen,et al.  Privacy-Preserving Technology to Help Millions of People: Federated Prediction Model for Stroke Prevention , 2020, ArXiv.

[19]  Mehryar Mohri,et al.  Agnostic Federated Learning , 2019, ICML.

[20]  Elias Bareinboim,et al.  Causal Transportability with Limited Experiments , 2013, AAAI.

[21]  L. Keele,et al.  A General Approach to Causal Mediation Analysis , 2010, Psychological methods.

[22]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[23]  Max Welling,et al.  Causal Effect Inference with Deep Latent-Variable Models , 2017, NIPS 2017.

[24]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[25]  Zhiwei Steven Wu,et al.  Orthogonal Random Forest for Causal Inference , 2018, ICML.

[26]  Mihaela van der Schaar,et al.  GANITE: Estimation of Individualized Treatment Effects using Generative Adversarial Nets , 2018, ICLR.

[27]  D. Rubin,et al.  The central role of the propensity score in observational studies for causal effects , 1983 .

[28]  Illtyd Trethowan Causality , 1938 .

[29]  Xinkun Nie,et al.  Quasi-oracle estimation of heterogeneous treatment effects , 2017, Biometrika.

[30]  Hubert Eichner,et al.  Federated Learning for Mobile Keyboard Prediction , 2018, ArXiv.

[31]  Laura A. Levit,et al.  Beyond the HIPAA Privacy Rule: Enhancing Privacy, Improving Health Through Research. Washington, DC: National Academies Press , 2009 .

[32]  Jeong-Yoon Lee,et al.  CausalML: Python Package for Causal Machine Learning , 2020, ArXiv.

[33]  Felipe A. Tobar,et al.  MOGPTK: The Multi-Output Gaussian Process Toolkit , 2020, Neurocomputing.

[34]  D. Rubin [On the Application of Probability Theory to Agricultural Experiments. Essay on Principles. Section 9.] Comment: Neyman (1923) and Causal Inference in Experiments and Observational Studies , 1990 .

[35]  J. Pearl Causal diagrams for empirical research , 1995 .

[36]  Blaise Agüera y Arcas,et al.  Communication-Efficient Learning of Deep Networks from Decentralized Data , 2016, AISTATS.

[37]  Vladimir Joukov,et al.  Fast Approximate Multioutput Gaussian Processes , 2020, IEEE Intelligent Systems.

[38]  Toniann Pitassi,et al.  Fairness through Causal Awareness: Learning Causal Latent-Variable Models for Biased Data , 2018, FAT.

[39]  Colin B. Compas,et al.  Federated Learning used for predicting outcomes in SARS-COV-2 patients , 2021, Research square.

[40]  Aidong Zhang,et al.  Representation Learning for Treatment Effect Estimation from Observational Data , 2018, NeurIPS.

[41]  Mauricio A. Álvarez,et al.  Non-linear process convolutions for multi-output Gaussian processes , 2018, AISTATS.

[42]  Vitaly Shmatikov,et al.  Privacy-preserving deep learning , 2015, 2015 53rd Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[43]  Elias Bareinboim,et al.  Causal inference and the data-fusion problem , 2016, Proceedings of the National Academy of Sciences.

[44]  Purnamrita Sarkar,et al.  A scalable bootstrap for massive data , 2011, 1112.5016.

[45]  Mihaela van der Schaar,et al.  Bayesian Inference of Individualized Treatment Effects using Multi-task Gaussian Processes , 2017, NIPS.

[46]  Mihaela van der Schaar,et al.  Estimating Counterfactual Treatment Outcomes over Time Through Adversarially Balanced Representations , 2020, ICLR.

[47]  Mihaela van der Schaar,et al.  Limits of Estimating Heterogeneous Treatment Effects: Guidelines for Practical Algorithm Design , 2018, ICML.

[48]  D. Rubin INFERENCE AND MISSING DATA , 1975 .

[49]  Micah J. Sheller,et al.  The future of digital health with federated learning , 2020, npj Digital Medicine.

[50]  Riccardo Miotto,et al.  Federated Learning of Electronic Health Records Improves Mortality Prediction in Patients Hospitalized with COVID-19 , 2020, medRxiv.

[51]  Trevor Hastie,et al.  Some methods for heterogeneous treatment effect estimation in high dimensions , 2017, Statistics in medicine.

[52]  Wei Shi,et al.  Federated learning of predictive models from federated Electronic Health Records , 2018, Int. J. Medical Informatics.

[53]  D. Green,et al.  Modeling Heterogeneous Treatment Effects in Survey Experiments with Bayesian Additive Regression Trees , 2012 .

[54]  Jie Xu,et al.  Federated Learning for Healthcare Informatics , 2019, ArXiv.