A Recursive Method for Structural Learning of Directed Acyclic Graphs

In this paper, we propose a recursive method for structural learning of directed acyclic graphs (DAGs), in which a problem of structural learning for a large DAG is first decomposed into two problems of structural learning for two small vertex subsets, each of which is then decomposed recursively into two problems of smaller subsets until none subset can be decomposed further. In our approach, search for separators of a pair of variables in a large DAG is localized to small subsets, and thus the approach can improve the efficiency of searches and the power of statistical tests for structural learning. We show how the recent advances in the learning of undirected graphical models can be employed to facilitate the decomposition. Simulations are given to demonstrate the performance of the proposed method.

[1]  Dan Geiger,et al.  A sufficiently fast algorithm for finding close to optimal clique trees , 2001, Artif. Intell..

[2]  Jeff A. Bilmes,et al.  Optimal sub-graphical models , 2004, NIPS.

[3]  Derek G. Corneil,et al.  Complexity of finding embeddings in a k -tree , 1987 .

[4]  Sebastian Thrun,et al.  Bayesian Network Induction via Local Neighborhoods , 1999, NIPS.

[5]  Qiang Zhao,et al.  Decomposition of structural learning about directed acyclic graphs , 2006, Artif. Intell..

[6]  Robert E. Tarjan,et al.  Simple Linear-Time Algorithms to Test Chordality of Graphs, Test Acyclicity of Hypergraphs, and Selectively Reduce Acyclic Hypergraphs , 1984, SIAM J. Comput..

[7]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[8]  Nir Friedman,et al.  Learning Bayesian Network Structure from Massive Datasets: The "Sparse Candidate" Algorithm , 1999, UAI.

[9]  Michael I. Jordan,et al.  A graphical model for predicting protein molecular function , 2006, ICML '06.

[10]  Christopher Meek,et al.  Causal inference and causal explanation with background knowledge , 1995, UAI.

[11]  David A. Bell,et al.  Learning Bayesian networks from data: An information-theory based approach , 2002, Artif. Intell..

[12]  Mark W. Schmidt,et al.  Learning Graphical Model Structure Using L1-Regularization Paths , 2007, AAAI.

[13]  Judea Pearl,et al.  Equivalence and Synthesis of Causal Models , 1990, UAI.

[14]  S. S. Wilks The Large-Sample Distribution of the Likelihood Ratio for Testing Composite Hypotheses , 1938 .

[15]  Robert E. Tarjan,et al.  Algorithmic Aspects of Vertex Elimination on Graphs , 1976, SIAM J. Comput..

[16]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[17]  Gregory F. Cooper,et al.  The ALARM Monitoring System: A Case Study with two Probabilistic Inference Techniques for Belief Networks , 1989, AIME.

[18]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .

[19]  Constantin F. Aliferis,et al.  Causal Explorer: A Causal Probabilistic Network Learning Toolkit for Biomedical Discovery , 2003, METMBS.

[20]  Constantin F. Aliferis,et al.  The max-min hill-climbing Bayesian network structure learning algorithm , 2006, Machine Learning.

[21]  P. Spirtes,et al.  Causation, prediction, and search , 1993 .

[22]  Constantin F. Aliferis,et al.  Algorithms for Large Scale Markov Blanket Discovery , 2003, FLAIRS.

[23]  Frank Jensen,et al.  Optimal junction Trees , 1994, UAI.

[24]  Martin J. Wainwright,et al.  High-Dimensional Graphical Model Selection Using ℓ1-Regularized Logistic Regression , 2006, NIPS.

[25]  Kevin Murphy,et al.  Bayes net toolbox for Matlab , 1999 .

[26]  Michael I. Jordan Graphical Models , 2003 .

[27]  David Maxwell Chickering,et al.  Large-Sample Learning of Bayesian Networks is NP-Hard , 2002, J. Mach. Learn. Res..

[28]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[29]  Michael I. Jordan,et al.  Probabilistic Networks and Expert Systems , 1999 .

[30]  David Maxwell Chickering,et al.  Learning Equivalence Classes of Bayesian Network Structures , 1996, UAI.

[31]  Robert Castelo,et al.  A Robust Procedure For Gaussian Graphical Model Search From Microarray Data With p Larger Than n , 2006, J. Mach. Learn. Res..

[32]  A. H. Murphy,et al.  Hailfinder: A Bayesian system for forecasting severe weather , 1996 .

[33]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[34]  Gregory F. Cooper,et al.  A Bayesian Method for the Induction of Probabilistic Networks from Data , 1992 .

[35]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[36]  Z. Geng,et al.  Decomposition of search for v-structures in DAGs , 2005 .

[37]  Daphne Koller,et al.  Toward Optimal Feature Selection , 1996, ICML.

[38]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.

[39]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[40]  Stuart J. Russell,et al.  Adaptive Probabilistic Networks with Hidden Variables , 1997, Machine Learning.

[41]  P. Spirtes,et al.  Ancestral graph Markov models , 2002 .

[42]  P. Spirtes,et al.  An Algorithm for Fast Recovery of Sparse Causal Graphs , 1991 .

[43]  P. Bühlmann,et al.  Statistical Applications in Genetics and Molecular Biology Low-Order Conditional Independence Graphs for Inferring Genetic Networks , 2011 .

[44]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..