Gradient-Based Neural DAG Learning

We propose a novel score-based approach to learning a directed acyclic graph (DAG) from observational data. We adapt a recently proposed continuous constrained optimization formulation to allow for nonlinear relationships between variables using neural networks. This extension allows to model complex interactions while avoiding the combinatorial nature of the problem. In addition to comparing our method to existing continuous optimization methods, we provide missing empirical comparisons to nonlinear greedy search methods. On both synthetic and real-world data sets, this new method outperforms current continuous methods on most tasks, while being competitive with existing greedy search methods on important metrics for causal inference.

[1]  J. Peters,et al.  Identifiability of Gaussian structural equation models with equal error variances , 2012, 1205.2536.

[2]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[3]  David Lopez-Paz,et al.  Structural Agnostic Modeling: Adversarial Learning of Causal Graphs , 2018, 1803.04929.

[4]  Tommi S. Jaakkola,et al.  Learning Bayesian Network Structure using LP Relaxations , 2010, AISTATS.

[5]  Michèle Sebag,et al.  Learning Functional Causal Models with Generative Neural Networks , 2018 .

[6]  Bernhard Schölkopf,et al.  Generalized Score Functions for Causal Discovery , 2018, KDD.

[7]  J. Peters,et al.  Structural Intervention Distance (SID) for Evaluating Causal Graphs , 2013, 1306.1043.

[8]  Lutz Prechelt,et al.  Early Stopping - But When? , 2012, Neural Networks: Tricks of the Trade.

[9]  Bernhard Schölkopf,et al.  Causal discovery with continuous additive noise models , 2013, J. Mach. Learn. Res..

[10]  Bernhard Schölkopf,et al.  On Estimation of Functional Causal Models , 2015, ACM Trans. Intell. Syst. Technol..

[11]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[12]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[13]  Pradeep Ravikumar,et al.  DAGs with NO TEARS: Continuous Optimization for Structure Learning , 2018, NeurIPS.

[14]  Ivan Laptev,et al.  Learning from Narrated Instruction Videos , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Lorenzo Rosasco,et al.  Theory of Deep Learning III: explaining the non-overfitting puzzle , 2017, ArXiv.

[16]  Pierre Geurts,et al.  Extremely randomized trees , 2006, Machine Learning.

[17]  Joris M. Mooij,et al.  Domain Adaptation by Using Causal Inference to Predict Invariant Conditional Distributions , 2017, NeurIPS.

[18]  Bernhard Schölkopf,et al.  Causal Inference on Discrete Data Using Additive Noise Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[20]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[21]  Peter Bühlmann,et al.  Structural Intervention Distance for Evaluating Causal Graphs , 2015, Neural Computation.

[22]  Geoffrey E. Hinton,et al.  Deep Learning , 2015, Nature.

[23]  David Lopez-Paz,et al.  SAM: Structural Agnostic Model, Causal Discovery and Penalized Adversarial Learning , 2018 .

[24]  S. Geer,et al.  $\ell_0$-penalized maximum likelihood for sparse directed acyclic graphs , 2012, 1205.5473.

[25]  Albert-László Barabási,et al.  Scale-Free Networks: A Decade and Beyond , 2009, Science.

[26]  Anna Drewek,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2010 .

[27]  Yoshua Bengio,et al.  Random Search for Hyper-Parameter Optimization , 2012, J. Mach. Learn. Res..

[28]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[29]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[30]  Nir Friedman,et al.  Probabilistic Graphical Models: Principles and Techniques - Adaptive Computation and Machine Learning , 2009 .

[31]  Aapo Hyvärinen,et al.  On the Identifiability of the Post-Nonlinear Causal Model , 2009, UAI.

[32]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[33]  Mark E. J. Newman,et al.  Power-Law Distributions in Empirical Data , 2007, SIAM Rev..

[34]  Po-Ling Loh,et al.  High-dimensional learning of linear causal networks via inverse covariance estimation , 2013, J. Mach. Learn. Res..

[35]  James Cussens,et al.  Bayesian network learning with cutting planes , 2011, UAI.

[36]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[37]  Judea Pearl,et al.  The seven tools of causal inference, with reflections on machine learning , 2019, Commun. ACM.

[38]  Danna Zhou,et al.  d. , 1934, Microbial pathogenesis.

[39]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[40]  Clark Glymour,et al.  A million variables and more: the Fast Greedy Equivalence Search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images , 2016, International Journal of Data Science and Analytics.

[41]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[42]  Peter Bühlmann,et al.  CAM: Causal Additive Models, high-dimensional order search and penalized regression , 2013, ArXiv.

[43]  Qing Zhou,et al.  Learning Directed Acyclic Graphs with Penalized Neighbourhood Regression , 2015, ArXiv.

[44]  Mo Yu,et al.  DAG-GNN: DAG Structure Learning with Graph Neural Networks , 2019, ICML.

[45]  Hugo Larochelle,et al.  MADE: Masked Autoencoder for Distribution Estimation , 2015, ICML.

[46]  Kathleen Marchal,et al.  SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms , 2006, BMC Bioinformatics.