Learning directed acyclic graph models based on sparsest permutations

We consider the problem of learning a Bayesian network or directed acyclic graph model from observational data. A number of constraint-based, score-based and hybrid algorithms have been developed for this purpose. Statistical consistency guarantees of these algorithms rely on the faithfulness assumption, which has been shown to be restrictive especially for graphs with cycles in the skeleton. We here propose the sparsest permutation (SP) algorithm, showing that learning Bayesian networks is possible under strictly weaker assumptions than faithfulness. This comes at a computational price, thereby indicating a statisticalcomputational trade-off for causal inference algorithms. In the Gaussian noiseless setting we prove that the SP algorithm boils down to finding the permutation of the variables with the sparsest Cholesky decomposition of the inverse covariance matrix, which is equivalent to `0-penalized maximum likelihood estimation. We end with a simulation study showing that in line with the proven stronger consistency guarantees the SP algorithm compares favorably to standard causal inference algorithms in terms of accuracy for a given sample size. Copyright c © 2012 John Wiley & Sons, Ltd.

[1]  Daphne Koller,et al.  Ordering-Based Search: A Simple and Effective Algorithm for Learning Bayesian Networks , 2005, UAI.

[2]  Peter Bühlmann,et al.  Causal Inference Using Graphical Models with the R Package pcalg , 2012 .

[3]  R. Fisher FREQUENCY DISTRIBUTION OF THE VALUES OF THE CORRELATION COEFFIENTS IN SAMPLES FROM AN INDEFINITELY LARGE POPU;ATION , 1915 .

[4]  Peter Spirtes,et al.  The three faces of faithfulness , 2015, Synthese.

[5]  Moninder Singh,et al.  An Algorithm for the Construction of Bayesian Network Structures from Data , 1993, UAI.

[6]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[7]  Timothy A. Davis,et al.  Direct methods for sparse linear systems , 2006, Fundamentals of algorithms.

[8]  S. Geer,et al.  $\ell_0$-penalized maximum likelihood for sparse directed acyclic graphs , 2012, 1205.5473.

[9]  Remco R. Bouckaert,et al.  Optimizing Causal Orderings for Generating DAGs from Data , 1992, UAI.

[10]  Peter Buhlmann,et al.  Geometry of the faithfulness assumption in causal inference , 2012, 1207.0547.

[11]  Jiji Zhang,et al.  A Comparison of Three Occam's Razors for Markovian Causal Models , 2013, The British Journal for the Philosophy of Science.

[12]  Richard Scheines,et al.  Discovering Causal Structure: Artificial Intelligence, Philosophy of Science, and Statistical Modeling , 1987 .

[13]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[14]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[15]  Anthony C. Davison,et al.  High-Dimensional Bayesian Clustering with Variable Selection: The R Package bclust , 2012 .

[16]  Jiji Zhang,et al.  Adjacency-Faithfulness and Conservative Causal Inference , 2006, UAI.

[17]  David Maxwell Chickering,et al.  Learning Bayesian Networks is , 1994 .

[18]  Bernhard Schölkopf,et al.  Causal Inference on Discrete Data Using Additive Noise Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Caroline Uhler,et al.  Consistency Guarantees for Permutation-Based Causal Inference Algorithms , 2017 .

[20]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[21]  M. Yannakakis Computing the Minimum Fill-in is NP^Complete , 1981 .

[22]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[23]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[24]  Jan Lemeire,et al.  Conservative independence-based causal structure learning in absence of adjacency faithfulness , 2012, Int. J. Approx. Reason..

[25]  Bernhard Schölkopf,et al.  Nonlinear causal discovery with additive noise models , 2008, NIPS.

[26]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[27]  Marco Scutari,et al.  Learning Bayesian Networks with the bnlearn R Package , 2009, 0908.3817.

[28]  Bernhard Schölkopf,et al.  Causal discovery with continuous additive noise models , 2013, J. Mach. Learn. Res..

[29]  Judea Pearl,et al.  Causal networks: semantics and expressiveness , 2013, UAI.

[30]  M. Pourahmadi Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation , 1999 .

[31]  P. Spirtes,et al.  A uniformly consistent estimator of causal effects under the k-Triangle-Faithfulness assumption , 2014, 1502.00829.

[32]  Jiji Zhang,et al.  Strong Faithfulness and Uniform Consistency in Causal Inference , 2002, UAI.

[33]  Jiji Zhang,et al.  Detection of Unfaithfulness and Robust Causal Inference , 2008, Minds and Machines.

[34]  M. Drton,et al.  Multiple Testing and Error Control in Gaussian Graphical Model Selection , 2005, math/0508267.

[35]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems , 1988 .

[36]  J. Robins,et al.  Uniform consistency in causal inference , 2003 .

[37]  Alan George,et al.  The Evolution of the Minimum Degree Ordering Algorithm , 1989, SIAM Rev..