Reliable Causal Discovery with Improved Exact Search and Weaker Assumptions

Many of the causal discovery methods rely on the faithfulness assumption to guarantee asymptotic correctness. However, the assumption can be approximately violated in many ways, leading to sub-optimal solutions. Although there is a line of research in Bayesian network structure learning that focuses on weakening the assumption, such as exact search methods with well-defined score functions, they do not scale well to large graphs. In this work, we introduce several strategies to improve the scalability of exact score-based methods in the linear Gaussian setting. In particular, we develop a super-structure estimation method based on the support of inverse covariance matrix which requires assumptions that are strictly weaker than faithfulness, and apply it to restrict the search space of exact search. We also propose a local search strategy that performs exact search on the local clusters formed by each variable and its neighbors within two hops in the superstructure. Numerical experiments validate the efficacy of the proposed procedure, and demonstrate that it scales up to hundreds of nodes with a high accuracy.

[1]  J. Peters,et al.  Structural Intervention Distance (SID) for Evaluating Causal Graphs , 2013, 1306.1043.

[2]  Bernhard Schölkopf,et al.  Elements of Causal Inference: Foundations and Learning Algorithms , 2017 .

[3]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[4]  Kun Zhang,et al.  On the Role of Sparsity and DAG Constraints for Learning Linear DAGs , 2020, NeurIPS.

[5]  Alan M. Frieze,et al.  Random graphs , 2006, SODA '06.

[6]  Mikko Koivisto,et al.  Exact Bayesian Structure Discovery in Bayesian Networks , 2004, J. Mach. Learn. Res..

[7]  Hao Zhang,et al.  Learning Causal Structures Based on Divide and Conquer , 2020, IEEE Transactions on Cybernetics.

[8]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[9]  Nils J. Nilsson,et al.  A Formal Basis for the Heuristic Determination of Minimum Cost Paths , 1968, IEEE Trans. Syst. Sci. Cybern..

[10]  R Scheines,et al.  The TETRAD Project: Constraint Based Aids to Causal Model Specification. , 1998, Multivariate behavioral research.

[11]  Thomas S. Richardson,et al.  Causal Inference in the Presence of Latent Variables and Selection Bias , 1995, UAI.

[12]  James Cussens,et al.  Bayesian network learning with cutting planes , 2011, UAI.

[13]  Thomas S. Richardson,et al.  Learning high-dimensional directed acyclic graphs with latent and selection variables , 2011, 1104.5617.

[14]  S. Miyano,et al.  Finding Optimal Bayesian Network Given a Super-Structure , 2008 .

[15]  Peter Buhlmann,et al.  Geometry of the faithfulness assumption in causal inference , 2012, 1207.0547.

[16]  Qing Zhou,et al.  Learning big Gaussian Bayesian networks: partition, estimation, and fusion , 2019, J. Mach. Learn. Res..

[17]  J. Pearl Causality: Models, Reasoning and Inference , 2000 .

[18]  P. Spirtes,et al.  A uniformly consistent estimator of causal effects under the k-Triangle-Faithfulness assumption , 2014, 1502.00829.

[19]  Jaroslaw Zola,et al.  Exact structure learning of Bayesian networks by optimal path extension , 2016, 2016 IEEE International Conference on Big Data (Big Data).

[20]  Clark Glymour,et al.  A million variables and more: the Fast Greedy Equivalence Search algorithm for learning high-dimensional graphical causal models, with an application to functional magnetic resonance images , 2016, International Journal of Data Science and Analytics.

[21]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[22]  Changhe Yuan,et al.  Improving Causal Discovery By Optimal Bayesian Network Learning , 2021, AAAI.

[23]  Tomi Silander,et al.  A Simple Approach for Finding the Globally Optimal Bayesian Network Structure , 2006, UAI.

[24]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[25]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[26]  Satoru Miyano,et al.  Finding Optimal Models for Small Gene Networks , 2003 .

[27]  Changhe Yuan,et al.  Learning Optimal Bayesian Networks Using A* Search , 2011, IJCAI.

[28]  Pradeep Ravikumar,et al.  DAGs with NO TEARS: Continuous Optimization for Structure Learning , 2018, NeurIPS.

[29]  James Cussens,et al.  Integer Linear Programming for the Bayesian network structure learning problem , 2017, Artif. Intell..

[30]  Andrew W. Moore,et al.  Finding optimal Bayesian networks by dynamic programming , 2005 .

[31]  Kun Zhang,et al.  Characterizing Distribution Equivalence and Structure Learning for Cyclic and Acyclic Directed Graphs , 2019, ICML.

[32]  David Maxwell Chickering,et al.  Learning Bayesian Networks is , 1994 .

[33]  Bin Yu,et al.  Counting and exploring sizes of Markov equivalence classes of directed acyclic graphs , 2015, J. Mach. Learn. Res..

[34]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[35]  Martin J. Wainwright,et al.  Model Selection in Gaussian Graphical Models: High-Dimensional Consistency of l1-regularized MLE , 2008, NIPS.

[36]  Jiji Zhang,et al.  Adjacency-Faithfulness and Conservative Causal Inference , 2006, UAI.

[37]  Pradeep Ravikumar,et al.  QUIC: quadratic approximation for sparse inverse covariance estimation , 2014, J. Mach. Learn. Res..

[38]  Po-Ling Loh,et al.  High-dimensional learning of linear causal networks via inverse covariance estimation , 2013, J. Mach. Learn. Res..

[39]  Caroline Uhler,et al.  Learning directed acyclic graph models based on sparsest permutations , 2018 .

[40]  P. Spirtes,et al.  An Algorithm for Fast Recovery of Sparse Causal Graphs , 1991 .

[41]  D. Haughton On the Choice of a Model to Fit Data from an Exponential Family , 1988 .

[42]  Bin Yu,et al.  High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence , 2008, 0811.3628.

[43]  Frederick Eberhardt,et al.  Discovering Cyclic Causal Models with Latent Variables: A General SAT-Based Procedure , 2013, UAI.

[44]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[45]  Marco Scutari,et al.  Learning Bayesian Networks with the bnlearn R Package , 2009, 0908.3817.

[46]  Martin Wainwright,et al.  Search for Causal Models , 2018 .

[47]  Garvesh Raskutti,et al.  The Frugal Inference of Causal Relations , 2018, The British Journal for the Philosophy of Science.

[48]  Jiji Zhang,et al.  Detection of Unfaithfulness and Robust Causal Inference , 2008, Minds and Machines.

[49]  Zhi Geng,et al.  A Recursive Method for Structural Learning of Directed Acyclic Graphs , 2008, J. Mach. Learn. Res..

[50]  Satoru Miyano,et al.  Optimal Search on Clustered Structural Constraint for Learning Bayesian Network Structure , 2010, J. Mach. Learn. Res..

[51]  Frederick Eberhardt,et al.  Constraint-based Causal Discovery: Conflict Resolution with Answer Set Programming , 2014, UAI.

[52]  Changhe Yuan,et al.  Learning Optimal Bayesian Networks: A Shortest Path Perspective , 2013, J. Artif. Intell. Res..