Causal discovery with continuous additive noise models

We consider the problem of learning causal directed acyclic graphs from an observational joint distribution. One can use these graphs to predict the outcome of interventional experiments, from which data are often not available. We show that if the observational distribution follows a structural equation model with an additive noise structure, the directed acyclic graph becomes identifiable from the distribution under mild conditions. This constitutes an interesting alternative to traditional methods that assume faithfulness and identify only the Markov equivalence class of the graph, thus leaving some edges undirected. We provide practical algorithms for finitely many samples, RESIT (regression with subsequent independence test) and two methods based on an independence score. We prove that RESIT is correct in the population setting and provide an empirical evaluation.

[1]  G. Darmois,et al.  Analyse générale des liaisons stochastiques: etude particulière de l'analyse factorielle linéaire , 1953 .

[2]  A. Dawid Conditional Independence in Statistical Theory , 1979 .

[3]  D. A. Kenny,et al.  Correlation and Causation , 1937, Wilmott.

[4]  D. A. Kenny,et al.  Correlation and Causation. , 1982 .

[5]  D. Haughton On the Choice of a Model to Fit Data from an Exponential Family , 1988 .

[6]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[7]  David Heckerman,et al.  Learning Gaussian Networks , 1994, UAI.

[8]  N. J. A. Sloane,et al.  The On-Line Encyclopedia of Integer Sequences , 2003, Electron. J. Comb..

[9]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[10]  David Maxwell Chickering,et al.  Learning Bayesian Networks is NP-Complete , 2016, AISTATS.

[11]  David Maxwell Chickering,et al.  A Transformational Characterization of Equivalent Bayesian Network Structures , 1995, UAI.

[12]  C. Meek,et al.  Graphical models: selecting causal and statistical models , 1997 .

[13]  Michael I. Jordan Graphical Models , 1998 .

[14]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[15]  Marek J. Druzdzel,et al.  Causal reversibility in Bayesian networks , 2001, J. Exp. Theor. Artif. Intell..

[16]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2003, J. Mach. Learn. Res..

[17]  P. Spirtes,et al.  Ancestral graph Markov models , 2002 .

[18]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[19]  Jiji Zhang,et al.  Strong Faithfulness and Uniform Consistency in Causal Inference , 2002, UAI.

[20]  Luis M. de Campos,et al.  Searching for Bayesian Network Structures in the Space of Restricted Acyclic Partially Directed Graphs , 2011, J. Artif. Intell. Res..

[21]  Y. Kano,et al.  Causal Inference Using Nonnormality , 2004 .

[22]  Wicher Bergsma,et al.  Testing conditional independence for continuous random variables , 2004 .

[23]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[24]  Bernhard Schölkopf,et al.  Kernel Methods for Measuring Independence , 2005, J. Mach. Learn. Res..

[25]  Daphne Koller,et al.  Ordering-Based Search: A Simple and Effective Algorithm for Learning Bayesian Networks , 2005, UAI.

[26]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[27]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[28]  Jiji Zhang,et al.  Adjacency-Faithfulness and Conservative Causal Inference , 2006, UAI.

[29]  D. Heckerman,et al.  A Bayesian Approach to Causal Discovery , 2006 .

[30]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[31]  Bernhard Schölkopf,et al.  Kernel Measures of Conditional Dependence , 2007, NIPS.

[32]  R. Scheines,et al.  Interventions and Causal Inference , 2007, Philosophy of Science.

[33]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[34]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[35]  Jiji Zhang,et al.  Detection of Unfaithfulness and Robust Causal Inference , 2008, Minds and Machines.

[36]  Patrik O. Hoyer,et al.  Estimation of causal effects using linear non-Gaussian causal models with hidden variables , 2008, Int. J. Approx. Reason..

[37]  J. Peters Asymmetries of Time Series under Inverting their Direction , 2008 .

[38]  Bernhard Schölkopf,et al.  Identifying confounders using additive noise models , 2009, UAI.

[39]  Aapo Hyvärinen,et al.  On the Identifiability of the Post-Nonlinear Causal Model , 2009, UAI.

[40]  Zoubin Ghahramani,et al.  The Hidden Life of Latent Variables: Bayesian Learning with Mixed Graph Models , 2009, J. Mach. Learn. Res..

[41]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[42]  Bernhard Schölkopf,et al.  Regression by dependence minimization and its application to causal inference in additive noise models , 2009, ICML '09.

[43]  Dominik Janzing,et al.  Justifying Additive Noise Model-Based Causal Discovery via Algorithmic Information Theory , 2009, Open Syst. Inf. Dyn..

[44]  Joris M. Mooij,et al.  Distinguishing between cause and effect , 2008, NIPS Causality: Objectives and Assessment.

[45]  Bernhard Schölkopf,et al.  Causal Inference Using the Algorithmic Markov Condition , 2008, IEEE Transactions on Information Theory.

[46]  Carl E. Rasmussen,et al.  Gaussian Processes for Machine Learning (GPML) Toolbox , 2010, J. Mach. Learn. Res..

[47]  Bernhard Schölkopf,et al.  On Causal Discovery with Cyclic Additive Noise Models , 2011, NIPS.

[48]  Bernhard Schölkopf,et al.  Kernel-based Conditional Independence Test and Application in Causal Discovery , 2011, UAI.

[49]  Satoru Miyano,et al.  Parallel Algorithm for Learning Optimal Bayesian Network Structure , 2011, J. Mach. Learn. Res..

[50]  Aapo Hyvärinen,et al.  DirectLiNGAM: A Direct Method for Learning a Linear Non-Gaussian Structural Equation Model , 2011, J. Mach. Learn. Res..

[51]  Masao Nagasaki,et al.  Estimating Genome-Wide Gene Networks Using Nonparametric Bayesian Network Models on Massively Parallel Computers , 2011, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[52]  Bernhard Schölkopf,et al.  Causal Inference on Discrete Data Using Additive Noise Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Bernhard Schölkopf,et al.  Identifiability of Causal Graphs using Functional Models , 2011, UAI.

[54]  Bernhard Schölkopf,et al.  Information-geometric approach to inferring causal directions , 2012, Artif. Intell..

[55]  J. Peters Restricted structural equation models for causal inference , 2012 .

[56]  Bernhard Schölkopf,et al.  On causal and anticausal learning , 2012, ICML.

[57]  Peter Bühlmann,et al.  CAM: Causal Additive Models, high-dimensional order search and penalized regression , 2013, ArXiv.

[58]  Aapo Hyvärinen,et al.  Pairwise likelihood ratios for estimation of non-Gaussian structural equation models , 2013, J. Mach. Learn. Res..

[59]  J. Peters,et al.  Structural Intervention Distance (SID) for Evaluating Causal Graphs , 2013, 1306.1043.

[60]  Bernhard Schölkopf,et al.  From Ordinary Differential Equations to Structural Causal Models: the deterministic case , 2013, UAI.

[61]  Peter Buhlmann,et al.  Geometry of the faithfulness assumption in causal inference , 2012, 1207.0547.

[62]  B. Schölkopf,et al.  Causal discovery with continuous additive noise models , 2013, J. Mach. Learn. Res..

[63]  J. Peters,et al.  Identifiability of Gaussian structural equation models with equal error variances , 2012, 1205.2536.

[64]  J. Peters On the Intersection Property of Conditional Independence and its Application to Causal Discovery , 2014, 1403.0408.

[65]  Peter Bühlmann,et al.  Structural Intervention Distance for Evaluating Causal Graphs , 2015, Neural Computation.

[66]  Bernhard Schölkopf,et al.  Distinguishing Cause from Effect Using Observational Data: Methods and Benchmarks , 2014, J. Mach. Learn. Res..

[67]  Andreas Ritter,et al.  Structural Equations With Latent Variables , 2016 .

[68]  P. Bühlmann,et al.  Score-based causal learning in additive noise models , 2013, 1311.6359.