The Causal-Neural Connection: Expressiveness, Learnability, and Inference

One of the central elements of any causal inference is an object called structural causal model (SCM), which represents a collection of mechanisms and exogenous sources of random variation of the system under investigation (Pearl, 2000). An important property of many kinds of neural networks is universal approximability: the ability to approximate any function to arbitrary precision. Given this property, one may be tempted to surmise that a collection of neural nets is capable of learning any SCM by training on data generated by that SCM. In this paper, we show this is not the case by disentangling the notions of expressivity and learnability. Specifically, we show that the causal hierarchy theorem (Thm. 1, Bareinboim et al., 2020), which describes the limits of what can be learned from data, still holds for neural models. For instance, an arbitrarily complex and expressive neural net is unable to predict the effects of interventions given observational data alone. Given this result, we introduce a special type of SCM called a neural causal model (NCM), and formalize a new type of inductive bias to encode structural constraints necessary for performing causal inferences. Building on this new class of models, we focus on solving two canonical tasks found in the literature known as causal identification and estimation. Leveraging the neural toolbox, we develop an algorithm that is both sufficient and necessary to determine whether a causal effect can be learned from data (i.e., causal identifiability); it then estimates the effect whenever identifiability holds (causal estimation). Simulations corroborate the proposed approach.

[1]  Allan Pinkus,et al.  Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[2]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[3]  Shakir Mohamed,et al.  Variational Inference with Normalizing Flows , 2015, ICML.

[4]  Elias Bareinboim,et al.  Structural Causal Bandits: Where to Intervene? , 2018, NeurIPS.

[5]  J. Pearl Causal diagrams for empirical research , 1995 .

[6]  David M. Blei,et al.  Adapting Neural Networks for the Estimation of Treatment Effects , 2019, NeurIPS.

[7]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[8]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[9]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[10]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[11]  Kyunghyun Cho,et al.  A Framework For Contrastive Self-Supervised Learning And Designing A New Approach , 2020, ArXiv.

[12]  Navdeep Jaitly,et al.  Towards End-To-End Speech Recognition with Recurrent Neural Networks , 2014, ICML.

[13]  Sivaraman Balakrishnan,et al.  Semiparametric Counterfactual Density Estimation , 2021, Biometrika.

[14]  Mihaela van der Schaar,et al.  GANITE: Estimation of Individualized Treatment Effects using Generative Adversarial Nets , 2018, ICLR.

[15]  Jin Tian,et al.  A general identification condition for causal effects , 2002, AAAI/IAAI.

[16]  Michèle Sebag,et al.  Learning Functional Causal Models with Generative Neural Networks , 2018 .

[17]  Max Welling,et al.  Causal Effect Inference with Deep Latent-Variable Models , 2017, NIPS 2017.

[18]  Elias Bareinboim,et al.  Local Characterizations of Causal Bayesian Networks , 2011, GKR.

[19]  Karthikeyan Shanmugam,et al.  Experimental Design for Learning Causal Graphs with Latent Variables , 2017, NIPS.

[20]  Hugo Larochelle,et al.  MADE: Masked Autoencoder for Distribution Estimation , 2015, ICML.

[21]  Judea Pearl,et al.  Counterfactual Probabilities: Computational Methods, Bounds and Applications , 1994, UAI.

[22]  Karthikeyan Shanmugam,et al.  Characterization and Learning of Causal Graphs with Latent Variables from Soft Interventions , 2019, NeurIPS.

[23]  G. Bray,et al.  A clinical trial of the effects of dietary patterns on blood pressure. DASH Collaborative Research Group. , 1997, The New England journal of medicine.

[24]  Jin Tian,et al.  Estimating Identifiable Causal Effects through Double Machine Learning , 2021, AAAI.

[25]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[26]  Yun Fu,et al.  Matching on Balanced Nonlinear Representations for Treatment Effects Estimation , 2017, NIPS.

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Robert L. Shook The book of why , 1983 .

[29]  Uri Shalit,et al.  Estimating individual treatment effect: generalization bounds and algorithms , 2016, ICML.

[30]  Uri Shalit,et al.  Generalization Bounds and Representation Learning for Estimation of Potential Outcomes and Causal Effects , 2020, ArXiv.

[31]  George Cybenko,et al.  Approximation by superpositions of a sigmoidal function , 1989, Math. Control. Signals Syst..

[32]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[33]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[34]  Bernhard Schölkopf,et al.  A Kernel Method for the Two-Sample-Problem , 2006, NIPS.

[35]  John E. Angus,et al.  The Probability Integral Transform and Related Results , 1994, SIAM Rev..

[36]  Tianqi Chen,et al.  XGBoost: A Scalable Tree Boosting System , 2016, KDD.

[37]  E. Gumbel Statistical Theory of Extreme Values and Some Practical Applications : A Series of Lectures , 1954 .

[38]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[39]  Murat Kocaoglu,et al.  Causal Discovery from Soft Interventions with Unknown Targets: Characterization and Learning , 2020, NeurIPS.

[40]  Elias Bareinboim,et al.  General Identifiability with Arbitrary Surrogate Experiments , 2019, UAI.

[41]  Nathan Kallus,et al.  DeepMatch: Balancing Deep Covariate Representations for Causal Inference Using Adversarial Training , 2018, ICML.

[42]  Alexandre Lacoste,et al.  Differentiable Causal Discovery from Interventional Data , 2020, NeurIPS.

[43]  Johannes Textor,et al.  Complete Graphical Characterization and Construction of Adjustment Sets in Markov Equivalence Classes of Ancestral Graphs , 2016, J. Mach. Learn. Res..

[44]  Elias Bareinboim,et al.  Estimating Causal Effects Using Weighting-Based Estimators , 2020, AAAI.

[45]  Jiji Zhang,et al.  On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias , 2008, Artif. Intell..

[46]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[47]  E. Bareinboim,et al.  On Pearl’s Hierarchy and the Foundations of Causal Inference , 2022 .

[48]  Elias Bareinboim,et al.  Characterizing Optimal Mixed Policies: Where to Intervene and What to Observe , 2020, NeurIPS.

[49]  Elias Bareinboim,et al.  Causal inference and the data-fusion problem , 2016, Proceedings of the National Academy of Sciences.

[50]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[51]  Jiji Zhang,et al.  Causal Identification under Markov Equivalence , 2018, UAI.

[52]  Jiji Zhang,et al.  Causal Identification under Markov Equivalence: Completeness Results , 2019, ICML.

[53]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[54]  Elias Bareinboim,et al.  Bandits with Unobserved Confounders: A Causal Approach , 2015, NIPS.

[55]  Elias Bareinboim,et al.  Counterfactual Data-Fusion for Online Reinforcement Learners , 2017, ICML.

[56]  Liwei Wang,et al.  The Expressive Power of Neural Networks: A View from the Width , 2017, NIPS.

[57]  Lei Sun,et al.  Adversarial balancing-based representation learning for causal effect inference with observational data , 2019, Data Mining and Knowledge Discovery.

[58]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[59]  Uri Shalit,et al.  Learning Representations for Counterfactual Inference , 2016, ICML.

[60]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[61]  Alexandros G. Dimakis,et al.  CausalGAN: Learning Causal Implicit Generative Models with Adversarial Training , 2017, ICLR.

[62]  Elias Bareinboim,et al.  General Transportability of Soft Interventions: Completeness Results , 2020, NeurIPS.

[63]  Ruocheng Guo,et al.  A Survey of Learning Causality with Data , 2018, ACM Comput. Surv..

[64]  Qiang Liu,et al.  A Kernelized Stein Discrepancy for Goodness-of-fit Tests , 2016, ICML.

[65]  Jin Tian,et al.  Learning Causal Effects via Weighted Empirical Risk Minimization , 2020, NeurIPS.

[66]  I. Guyon,et al.  Explainable and Interpretable Models in Computer Vision and Machine Learning , 2017, The Springer Series on Challenges in Machine Learning.

[67]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[68]  Christopher Joseph Pal,et al.  A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms , 2019, ICLR.

[69]  Aidong Zhang,et al.  Representation Learning for Treatment Effect Estimation from Observational Data , 2018, NeurIPS.