Variational Causal Networks: Approximate Bayesian Inference over Causal Structures

Learning the causal structure that underlies data is a crucial step towards robust real-world decision making. The majority of existing work in causal inference focuses on determining a single directed acyclic graph (DAG) or a Markov equivalence class thereof. However, a crucial aspect to acting intelligently upon the knowledge about causal structure which has been inferred from finite data demands reasoning about its uncertainty. For instance, planning interventions to find out more about the causal mechanisms that govern our data requires quantifying epistemic uncertainty over DAGs. While Bayesian causal inference allows to do so, the posterior over DAGs becomes intractable even for a small number of variables. Aiming to overcome this issue, we propose a form of variational inference over the graphs of Structural Causal Models (SCMs). To this end, we introduce a parametric variational family modelled by an autoregressive distribution over the space of discrete DAGs. Its number of parameters does not grow exponentially with the number of variables and can be tractably learned by maximising an Evidence Lower Bound (ELBO). In our experiments, we demonstrate that the proposed variational posterior is able to provide a good approximation of the true posterior. Figure 1: Schematic Diagram of Variational Inference on SCMs with VCN. Moving from learning correlation and association in data to causation is a critical step towards increased robustness, interpretability and real-world decision-making [27, 34]. Doing so entails learning the causal structure underlying the data generating process. Causal inference is concerned with determining the causal structure of a set of random variables from data, commonly represented as a directed acycilc graph (DAG) [29]. While Structural Causal Models (SCMs) provide a generative model over the data, they are hard to learn from data due to the non-identifiability of the causal models without interventional data [7] and the combinatorial nature of the space of DAGs [15]. Even with infinite amount of data, recovering the causal structure is intrinsically hard since a DAG is only identifiable up to its Markov equivalence class (MEC) and the space of possible DAGs grows super-exponentially with the number of variables. While the majority of work on causal inference [6, 5, 38] deals with getting a single underlying causal structure without a probabilistic treatment, quantifying the epistemic uncertainty in case of non-identifiability is crucial and is not possible in these approaches. ∗Work done during an internship at Mila. Preprint. ar X iv :2 10 6. 07 63 5v 1 [ cs .L G ] 1 4 Ju n 20 21 In this work, we take a Bayesian approach to causal structure learning. Given only finite observational data, a Bayesian approach allows us to quantify the uncertainty in the causal structure of the data generating process, even before performing interventions. Having such a framework over causal structures can help further downstream tasks on graph learning and causal inference. For example, we can leverage the model’s uncertainty to select informative interventions and discover the full graph with minimal amount of interventions [2]. Also, for estimating the causal effect of certain variables, it is often not required to know the full graph. Hence, by marginalising the posterior we can uncover the confidence we have about a specific causal effect, which may already fall below a specified tolerance level. Having such a model is very desirable as interventions are hard to perform, sometimes unethical and even impossible [29]. A key component in learning causal structures is the identifiability under the availability of observational data. While some assumptions are always required to say anything about the underlying causal process, additional assumptions are sometimes made to especially make the causal model identifiable from (observational) data. These additional assumptions are not necessarily part of the data generating process, and hence the recovered causal structure may be incorrect due to model misspecification. In addition, identifiability results are usually asymptotic in the number of samples. Causal discovery in a limited sample regime calls for active learning to perform interventions to improve identifiability. Such setups benefit from probabilistic reasoning about unknown causal structures. We are interested in such settings and we propose a Bayesian framework for linear SCM’s with additive noise which can quantify the uncertainty in the learned causal structure. Our contributions are as follows: • We perform Bayesian inference over the unknown causal structures. The posterior over this distribution is intractable. Therefore, we perform variational inference and demonstrate how we can model the variational family over this distribution. • The key contribution is to model distributions over adjacency matrices of the causal structures with an autoregressive distribution using an LSTM. • Empirically demonstrate the performance of our modeling choice. • Evaluating Bayesian causal inference techniques are hard in practice. We discuss the difficulty entailed in evaluation, as well as provide insights which alleviate this problem.

[1]  Jonas Peters,et al.  Causal inference by using invariant prediction: identification and confidence intervals , 2015, 1501.01332.

[2]  Addendum on the scoring of Gaussian directed acyclic graphical models , 2014, 1402.6863.

[3]  W. Wong,et al.  Learning Causal Bayesian Network Structures From Experimental Data , 2008 .

[4]  Peter Bühlmann,et al.  CAM: Causal Additive Models, high-dimensional order search and penalized regression , 2013, ArXiv.

[5]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[6]  Mo Yu,et al.  DAG-GNN: DAG Structure Learning with Graph Neural Networks , 2019, ICML.

[7]  Christina Heinze-Deml,et al.  Causal Structure Learning , 2017, 1706.09141.

[8]  David Heckerman,et al.  Parameter Priors for Directed Acyclic Graphical Models and the Characteriration of Several Probability Distributions , 1999, UAI.

[9]  Shohei Shimizu,et al.  Lingam: Non-Gaussian Methods for Estimating Causal Structures , 2014, Behaviormetrika.

[10]  Christina Heinze-Deml,et al.  Invariant Causal Prediction for Nonlinear Models , 2017, Journal of Causal Inference.

[11]  Giusi Moffa,et al.  Partition MCMC for Inference on Acyclic Digraphs , 2015, 1504.05006.

[12]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[13]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[14]  Kumar Krishna Agrawal,et al.  Discrete Flows: Invertible Generative Models of Discrete Data , 2019, DGS@ICLR.

[15]  R. Scheines,et al.  Interventions and Causal Inference , 2007, Philosophy of Science.

[16]  Peter Bühlmann,et al.  Characterization and Greedy Learning of Interventional Markov Equivalence Classes of Directed Acyclic Graphs (Abstract) , 2011, UAI.

[17]  Nan Rosemary Ke,et al.  Learning Neural Causal Models from Unknown Interventions , 2019, ArXiv.

[18]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[19]  Chandler Squires,et al.  ABCD-Strategy: Budgeted Experimental Design for Targeted Causal Structure Discovery , 2019, AISTATS.

[20]  N. D. Clarke,et al.  Towards a Rigorous Assessment of Systems Biology Models: The DREAM3 Challenges , 2010, PloS one.

[21]  Dario Floreano,et al.  GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods , 2011, Bioinform..

[22]  Clark Glymour,et al.  A million variables and more: the Fast Greedy Equivalence Search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images , 2016, International Journal of Data Science and Analytics.

[23]  Pradeep Ravikumar,et al.  DAGs with NO TEARS: Continuous Optimization for Structure Learning , 2018, NeurIPS.

[24]  Tristan Deleu,et al.  Gradient-Based Neural DAG Learning , 2019, ICLR.

[25]  Marco Grzegorczyk,et al.  Improving the structure MCMC sampler for Bayesian networks by introducing a new edge reversal move , 2008, Machine Learning.

[26]  Martin Szummer,et al.  Amortized learning of neural causal representations , 2020, ArXiv.

[27]  Mikko Koivisto,et al.  Structure Discovery in Bayesian Networks by Sampling Partial Orders , 2016, J. Mach. Learn. Res..

[28]  Emiel Hoogeboom,et al.  Integer Discrete Flows and Lossless Compression , 2019, NeurIPS.

[29]  Tamara Broderick,et al.  Minimal I-MAP MCMC for Scalable Structure Discovery in Causal DAG Models , 2018, ICML.

[30]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[31]  David M. Blei,et al.  Variational Inference: A Review for Statisticians , 2016, ArXiv.

[32]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[33]  D. Heckerman,et al.  A Bayesian Approach to Causal Discovery , 2006 .

[34]  David Duvenaud,et al.  Backpropagation through the Void: Optimizing control variates for black-box gradient estimation , 2017, ICLR.

[35]  Christopher Joseph Pal,et al.  A Meta-Transfer Objective for Learning to Disentangle Causal Mechanisms , 2019, ICLR.

[36]  J. York,et al.  Bayesian Graphical Models for Discrete Data , 1995 .