论文信息 - Variational Causal Networks: Approximate Bayesian Inference over Causal Structures

Variational Causal Networks: Approximate Bayesian Inference over Causal Structures

Learning the causal structure that underlies data is a crucial step towards robust real-world decision making. The majority of existing work in causal inference focuses on determining a single directed acyclic graph (DAG) or a Markov equivalence class thereof. However, a crucial aspect to acting intelligently upon the knowledge about causal structure which has been inferred from finite data demands reasoning about its uncertainty. For instance, planning interventions to find out more about the causal mechanisms that govern our data requires quantifying epistemic uncertainty over DAGs. While Bayesian causal inference allows to do so, the posterior over DAGs becomes intractable even for a small number of variables. Aiming to overcome this issue, we propose a form of variational inference over the graphs of Structural Causal Models (SCMs). To this end, we introduce a parametric variational family modelled by an autoregressive distribution over the space of discrete DAGs. Its number of parameters does not grow exponentially with the number of variables and can be tractably learned by maximising an Evidence Lower Bound (ELBO). In our experiments, we demonstrate that the proposed variational posterior is able to provide a good approximation of the true posterior. Figure 1: Schematic Diagram of Variational Inference on SCMs with VCN. Moving from learning correlation and association in data to causation is a critical step towards increased robustness, interpretability and real-world decision-making [27, 34]. Doing so entails learning the causal structure underlying the data generating process. Causal inference is concerned with determining the causal structure of a set of random variables from data, commonly represented as a directed acycilc graph (DAG) [29]. While Structural Causal Models (SCMs) provide a generative model over the data, they are hard to learn from data due to the non-identifiability of the causal models without interventional data [7] and the combinatorial nature of the space of DAGs [15]. Even with infinite amount of data, recovering the causal structure is intrinsically hard since a DAG is only identifiable up to its Markov equivalence class (MEC) and the space of possible DAGs grows super-exponentially with the number of variables. While the majority of work on causal inference [6, 5, 38] deals with getting a single underlying causal structure without a probabilistic treatment, quantifying the epistemic uncertainty in case of non-identifiability is crucial and is not possible in these approaches. ∗Work done during an internship at Mila. Preprint. ar X iv :2 10 6. 07 63 5v 1 [ cs .L G ] 1 4 Ju n 20 21 In this work, we take a Bayesian approach to causal structure learning. Given only finite observational data, a Bayesian approach allows us to quantify the uncertainty in the causal structure of the data generating process, even before performing interventions. Having such a framework over causal structures can help further downstream tasks on graph learning and causal inference. For example, we can leverage the model’s uncertainty to select informative interventions and discover the full graph with minimal amount of interventions [2]. Also, for estimating the causal effect of certain variables, it is often not required to know the full graph. Hence, by marginalising the posterior we can uncover the confidence we have about a specific causal effect, which may already fall below a specified tolerance level. Having such a model is very desirable as interventions are hard to perform, sometimes unethical and even impossible [29]. A key component in learning causal structures is the identifiability under the availability of observational data. While some assumptions are always required to say anything about the underlying causal process, additional assumptions are sometimes made to especially make the causal model identifiable from (observational) data. These additional assumptions are not necessarily part of the data generating process, and hence the recovered causal structure may be incorrect due to model misspecification. In addition, identifiability results are usually asymptotic in the number of samples. Causal discovery in a limited sample regime calls for active learning to perform interventions to improve identifiability. Such setups benefit from probabilistic reasoning about unknown causal structures. We are interested in such settings and we propose a Bayesian framework for linear SCM’s with additive noise which can quantify the uncertainty in the learned causal structure. Our contributions are as follows: • We perform Bayesian inference over the unknown causal structures. The posterior over this distribution is intractable. Therefore, we perform variational inference and demonstrate how we can model the variational family over this distribution. • The key contribution is to model distributions over adjacency matrices of the causal structures with an autoregressive distribution using an LSTM. • Empirically demonstrate the performance of our modeling choice. • Evaluating Bayesian causal inference techniques are hard in practice. We discuss the difficulty entailed in evaluation, as well as provide insights which alleviate this problem.