Improving Causal Discovery By Optimal Bayesian Network Learning

Many widely-used causal discovery methods such as Greedy Equivalent Search (GES), although with asymptotic correctness guarantees, have been reported to produce sub-optimal solutions on finite data, or when the causal faithfulness condition is violated. The constraint-based procedure with Boolean satisfiability (SAT) solver, and the recently proposed Sparsest Permutation (SP) algorithm have shown superb performance, but currently they do not scale well. In this work, we demonstrate that optimal score-based exhaustive search is remarkably useful for causal discovery: it requires weaker conditions to guarantee asymptotic correctness, and outperforms wellknown methods including PC, GES, GSP, and NOTEARS. In order to achieve scalability, we also develop an approximation algorithm for larger systems based on the A* method, which scales up to 60+ variables and obtains better results than existing greedy algorithms such as GES, MMHC, and GSP. Our results illustrate the risk of assuming the faithfulness assumption, the advantages of exhaustive search methods, and the limitations of greedy search methods, and shed light on the computational challenges and techniques in scaling up to larger networks and handling unfaithful data.

[1]  Garvesh Raskutti,et al.  The Frugal Inference of Causal Relations , 2018, The British Journal for the Philosophy of Science.

[2]  Peter Bühlmann,et al.  Causal Inference Using Graphical Models with the R Package pcalg , 2012 .

[3]  Satoru Miyano,et al.  Optimal Search on Clustered Structural Constraint for Learning Bayesian Network Structure , 2010, J. Mach. Learn. Res..

[4]  Aapo Hyvärinen,et al.  A Linear Non-Gaussian Acyclic Model for Causal Discovery , 2006, J. Mach. Learn. Res..

[5]  Constantin F. Aliferis,et al.  Time and sample efficient discovery of Markov blankets and direct causal relations , 2003, KDD '03.

[6]  Caroline Uhler,et al.  Learning directed acyclic graph models based on sparsest permutations , 2018 .

[7]  R. Nishii Asymptotic Properties of Criteria for Selection of Variables in Multiple Regression , 1984 .

[8]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[9]  Caroline Uhler,et al.  Consistency Guarantees for Permutation-Based Causal Inference Algorithms , 2017 .

[10]  Christopher Meek,et al.  Strong completeness and faithfulness in Bayesian networks , 1995, UAI.

[11]  Frederick Eberhardt,et al.  Constraint-based Causal Discovery: Conflict Resolution with Answer Set Programming , 2014, UAI.

[12]  Changhe Yuan,et al.  Learning Optimal Bayesian Networks: A Shortest Path Perspective , 2013, J. Artif. Intell. Res..

[13]  Frederick Eberhardt,et al.  Discovering Cyclic Causal Models with Latent Variables: A General SAT-Based Procedure , 2013, UAI.

[14]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[15]  R. Shibata An optimal selection of regression variables , 1981 .

[16]  Nir Friedman,et al.  Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm , 1999, UAI.

[17]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2002, J. Mach. Learn. Res..

[18]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[19]  Pradeep Ravikumar,et al.  DAGs with NO TEARS: Continuous Optimization for Structure Learning , 2018, NeurIPS.