High-dimensional learning of linear causal networks via inverse covariance estimation

We establish a new framework for statistical estimation of directed acyclic graphs (DAGs) when data are generated from a linear, possibly non-Gaussian structural equation model. Our framework consists of two parts: (1) inferring the moralized graph from the support of the inverse covariance matrix; and (2) selecting the best-scoring graph amongst DAGs that are consistent with the moralized graph. We show that when the error variances are known or estimated to close enough precision, the true DAG is the unique minimizer of the score computed using the reweighted squared l2-loss. Our population-level results have implications for the identifiability of linear SEMs when the error covariances are specified up to a constant multiple. On the statistical side, we establish rigorous conditions for high-dimensional consistency of our two-part algorithm, defined in terms of a "gap" between the true DAG and the next best candidate. Finally, we demonstrate that dynamic programming may be used to select the optimal DAG in linear time when the treewidth of the moralized graph is bounded.

[1]  Hans L. Bodlaender,et al.  A Partial k-Arboretum of Graphs with Bounded Treewidth , 1998, Theor. Comput. Sci..

[2]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[3]  Peter Bühlmann,et al.  Causal stability ranking , 2011, Bioinform..

[4]  Odd O Aalen,et al.  Causality, mediation and time: a dynamic viewpoint , 2012, Journal of the Royal Statistical Society. Series A,.

[5]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[6]  Michael I. Jordan Graphical Models , 2003 .

[7]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[8]  Peter Bühlmann,et al.  CAM: Causal Additive Models, high-dimensional order search and penalized regression , 2013, ArXiv.

[9]  Stefan Szeider,et al.  Algorithms and Complexity Results for Exact Bayesian Structure Learning , 2010, UAI.

[10]  J. Peters,et al.  Identifiability of Gaussian structural equation models with equal error variances , 2012, 1205.2536.

[11]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[12]  Tomi Silander,et al.  A Simple Approach for Finding the Globally Optimal Bayesian Network Structure , 2006, UAI.

[13]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[14]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[15]  Yudong D. He,et al.  Functional Discovery via a Compendium of Expression Profiles , 2000, Cell.

[16]  David Maxwell Chickering,et al.  A Transformational Characterization of Equivalent Bayesian Network Structures , 1995, UAI.

[17]  Ton Kloks Treewidth, Computations and Approximations , 1994, Lecture Notes in Computer Science.

[18]  Bin Yu,et al.  High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence , 2008, 0811.3628.

[19]  Po-Ling Loh,et al.  High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity , 2011, NIPS.

[20]  Ali Shojaie,et al.  Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. , 2009, Biometrika.

[21]  Po-Ling Loh,et al.  Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses , 2012, NIPS.

[22]  Janne H. Korhonen,et al.  Exact Learning of Bounded Tree-width Bayesian Networks , 2013, AISTATS.

[23]  S. Miyano,et al.  Finding Optimal Bayesian Network Given a Super-Structure , 2008 .

[24]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[25]  S. Geer,et al.  $\ell_0$-penalized maximum likelihood for sparse directed acyclic graphs , 2012, 1205.5473.

[26]  Aapo Hyvärinen,et al.  DirectLiNGAM: A Direct Method for Learning a Linear Non-Gaussian Structural Equation Model , 2011, J. Mach. Learn. Res..

[27]  P. Bühlmann,et al.  Score-based causal learning in additive noise models , 2013, 1311.6359.

[28]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..