A Graph Autoencoder Approach to Causal Structure Learning

Causal structure learning has been a challenging task in the past decades and several mainstream approaches such as constraint- and score-based methods have been studied with theoretical guarantees. Recently, a new approach has transformed the combinatorial structure learning problem into a continuous one and then solved it using gradient-based optimization methods. Following the recent state-of-the-arts, we propose a new gradient-based method to learn causal structures from observational data. The proposed method generalizes the recent gradient-based methods to a graph autoencoder framework that allows nonlinear structural equation models and is easily applicable to vector-valued variables. We demonstrate that on synthetic datasets, our proposed method outperforms other gradient-based methods significantly, especially on large causal graphs. We further investigate the scalability and efficiency of our method, and observe a near linear training time when scaling up the graph size.

[1]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[2]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[3]  Mo Yu,et al.  DAG-GNN: DAG Structure Learning with Graph Neural Networks , 2019, ICML.

[4]  Lise Getoor,et al.  Scalable Probabilistic Causal Structure Discovery , 2018, IJCAI.

[5]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[6]  Nir Friedman,et al.  Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning , 2009 .

[7]  Christopher Meek,et al.  Causal inference and causal explanation with background knowledge , 1995, UAI.

[8]  David Maxwell Chickering,et al.  Efficient Approximations for the Marginal Likelihood of Bayesian Networks with Hidden Variables , 1997, Machine Learning.

[9]  Torsten Hoefler,et al.  Demystifying Parallel and Distributed Deep Learning , 2018, ACM Comput. Surv..

[10]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[11]  Jiji Zhang,et al.  On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias , 2008, Artif. Intell..

[12]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[15]  Bernhard Schölkopf,et al.  Causal Inference Using the Algorithmic Markov Condition , 2008, IEEE Transactions on Information Theory.

[16]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[17]  R. Bouckaert Minimum Description Length Principle , 1994 .

[18]  Aapo Hyvärinen,et al.  On the Identifiability of the Post-Nonlinear Causal Model , 2009, UAI.

[19]  Huawei Shen,et al.  Node classification framework , 2019 .

[20]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[21]  Pradeep Ravikumar,et al.  DAGs with NO TEARS: Continuous Optimization for Structure Learning , 2018, NeurIPS.

[22]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[23]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[24]  Judea Pearl,et al.  The seven tools of causal inference, with reflections on machine learning , 2019, Commun. ACM.

[25]  Peter Bühlmann,et al.  CAM: Causal Additive Models, high-dimensional order search and penalized regression , 2013, ArXiv.