Masked Gradient-Based Causal Structure Learning

Learning causal graphical models based on directed acyclic graphs is an important task in causal discovery and causal inference. We consider a general framework towards efficient causal structure learning with potentially large graphs. Within this framework, we propose a masked gradient-based structure learning method based on binary adjacency matrix that exists for any structural equation model. To enable first-order optimization methods, we use Gumbel-Softmax approach to approximate the binary valued entries of the adjacency matrix, which usually results in real values that are close to zero or one. The proposed method can readily include any differentiable score function and model function for learning causal structures. Experiments on both synthetic and real-world datasets are conducted to show the effectiveness of our approach.

[1]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[2]  Diego Colombo,et al.  Order-independent constraint-based causal structure learning , 2012, J. Mach. Learn. Res..

[3]  Mo Yu,et al.  DAG-GNN: DAG Structure Learning with Graph Neural Networks , 2019, ICML.

[4]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine-mediated learning.

[5]  Pradeep Ravikumar,et al.  DAGs with NO TEARS: Continuous Optimization for Structure Learning , 2018, NeurIPS.

[6]  Constantin F. Aliferis,et al.  Algorithms for Large Scale Markov Blanket Discovery , 2003, FLAIRS.

[7]  Christopher Meek,et al.  Causal inference and causal explanation with background knowledge , 1995, UAI.

[8]  Bin Yu,et al.  Counting and exploring sizes of Markov equivalence classes of directed acyclic graphs , 2015, J. Mach. Learn. Res..

[9]  Yee Whye Teh,et al.  The Concrete Distribution: A Continuous Relaxation of Discrete Random Variables , 2016, ICLR.

[10]  Karol Gregor,et al.  Neural Variational Inference and Learning in Belief Networks , 2014, ICML.

[11]  Zhitang Chen,et al.  A Graph Autoencoder Approach to Causal Structure Learning , 2019, ArXiv.

[12]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[13]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[14]  Sergey Levine,et al.  MuProp: Unbiased Backpropagation for Stochastic Neural Networks , 2015, ICLR.

[15]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[16]  Peter Bühlmann,et al.  CAM: Causal Additive Models, high-dimensional order search and penalized regression , 2013, ArXiv.

[17]  Zhitang Chen,et al.  Causal Discovery with Reinforcement Learning , 2019, ICLR.

[18]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[19]  David Maxwell Chickering,et al.  Learning Bayesian Networks is NP-Complete , 2016, AISTATS.

[20]  Peter J. F. Lucas,et al.  Bayesian networks in biomedicine and health-care , 2004, Artif. Intell. Medicine.

[21]  Peter Spirtes,et al.  Introduction to Causal Inference , 2010, J. Mach. Learn. Res..

[22]  Bernhard Schölkopf,et al.  Measuring Statistical Dependence with Hilbert-Schmidt Norms , 2005, ALT.

[23]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[24]  Jiji Zhang,et al.  On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias , 2008, Artif. Intell..

[25]  P. Spirtes,et al.  An Algorithm for Fast Recovery of Sparse Causal Graphs , 1991 .

[26]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[27]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[28]  A. H. Murphy,et al.  Hailfinder: A Bayesian system for forecasting severe weather , 1996 .

[29]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[30]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[31]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[32]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[33]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[34]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[35]  Yoshua Bengio,et al.  Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation , 2013, ArXiv.

[36]  Aapo Hyvärinen,et al.  DirectLiNGAM: A Direct Method for Learning a Linear Non-Gaussian Structural Equation Model , 2011, J. Mach. Learn. Res..

[37]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[38]  P. Spirtes,et al.  Review of Causal Discovery Methods Based on Graphical Models , 2019, Front. Genet..

[39]  Ben Poole,et al.  Categorical Reparameterization with Gumbel-Softmax , 2016, ICLR.

[40]  Tristan Deleu,et al.  Gradient-Based Neural DAG Learning , 2019, ICLR.

[41]  Aapo Hyvärinen,et al.  On the Identifiability of the Post-Nonlinear Causal Model , 2009, UAI.

[42]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[43]  Bernhard Schölkopf,et al.  Causal discovery with continuous additive noise models , 2013, J. Mach. Learn. Res..

[44]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[45]  David Lopez-Paz,et al.  Structural Agnostic Modeling: Adversarial Learning of Causal Graphs , 2018, 1803.04929.