A Gibbs Sampler for Learning DAGs

We propose a Gibbs sampler for structure learning in directed acyclic graph (DAG) models. The standard Markov chain Monte Carlo algorithms used for learning DAGs are random-walk Metropolis-Hastings samplers. These samplers are guaranteed to converge asymptotically but often mix slowly when exploring the large graph spaces that arise in structure learning. In each step, the sampler we propose draws entire sets of parents for multiple nodes from the appropriate conditional distribution. This provides an efficient way to make large moves in graph space, permitting faster mixing whilst retaining asymptotic guarantees of convergence. The conditional distribution is related to variable selection with candidate parents playing the role of covariates or inputs. We empirically examine the performance of the sampler using several simulated and real data examples. The proposed method gives robust results in diverse settings, outperforming several existing Bayesian and frequentist methods. In addition, our empirical results shed some light on the relative merits of Bayesian and constraint-based methods for structure learning.

[1]  Daniel B. Carr,et al.  Scatterplot matrix techniques for large N , 1986 .

[2]  W. Wong,et al.  Learning Causal Bayesian Network Structures From Experimental Data , 2008 .

[3]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[4]  J B GoudieRobert,et al.  A Gibbs sampler for learning DAGs , 2016 .

[5]  Mikko Koivisto,et al.  Exact Structure Discovery in Bayesian Networks with Less Space , 2009, UAI.

[6]  Julian Besag,et al.  Discussion: Markov Chains for Exploring Posterior Distributions , 1994 .

[7]  Kevin B. Korb,et al.  Bayesian Artificial Intelligence, Second Edition , 2010 .

[8]  J. Berger,et al.  Optimal predictive model selection , 2004, math/0406464.

[9]  Srinivas Aluru,et al.  Parallel globally optimal structure learning of Bayesian networks , 2013, J. Parallel Distributed Comput..

[10]  Valerie King,et al.  A fully dynamic algorithm for maintaining the transitive closure , 1999, STOC '99.

[11]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[12]  Hani Doss Discussion: Markov Chains for Exploring Posterior Distributions , 1994 .

[13]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[14]  Zhi Geng,et al.  A Recursive Method for Structural Learning of Directed Acyclic Graphs , 2008, J. Mach. Learn. Res..

[15]  J. Ian Munro,et al.  Efficient Determination of the Transitive Closure of a Directed Graph , 1971, Inf. Process. Lett..

[16]  J. York,et al.  Bayesian Graphical Models for Discrete Data , 1995 .

[17]  Sean C. Bendall,et al.  Single-Cell Mass Cytometry of Differential Immune and Drug Responses Across a Human Hematopoietic Continuum , 2011, Science.

[18]  Daniel Zelterman,et al.  Bayesian Artificial Intelligence , 2005, Technometrics.

[19]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[20]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[21]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[22]  Don Coppersmith,et al.  Matrix multiplication via arithmetic progressions , 1987, STOC.

[23]  David Maxwell Chickering,et al.  Learning Equivalence Classes of Bayesian Network Structures , 1996, UAI.

[24]  Mikko Koivisto,et al.  Finding optimal Bayesian networks using precedence constraints , 2013, J. Mach. Learn. Res..

[25]  Giuseppe F. Italiano,et al.  Trade-offs for fully dynamic transitive closure on DAGs: breaking through the O(n2 barrier , 2005, JACM.

[26]  David Maxwell Chickering,et al.  Optimal Structure Identification With Greedy Search , 2003, J. Mach. Learn. Res..

[27]  David Heckerman,et al.  Learning Gaussian Networks , 1994, UAI.

[28]  Jin Tian,et al.  Computing Posterior Probabilities of Structural Features in Bayesian Networks , 2009, UAI.

[29]  Mikko Koivisto,et al.  Exact Bayesian Structure Discovery in Bayesian Networks , 2004, J. Mach. Learn. Res..

[30]  Paolo Giudici,et al.  Improving Markov Chain Monte Carlo Model Search for Data Mining , 2004, Machine Learning.

[31]  Satoru Miyano,et al.  Parallel Algorithm for Learning Optimal Bayesian Network Structure , 2011, J. Mach. Learn. Res..

[32]  Diego Colombo,et al.  Order-independent constraint-based causal structure learning , 2012, J. Mach. Learn. Res..

[33]  Connectedness conditions for the convergence of the Gibbs sampler , 1997 .

[34]  David Eppstein,et al.  Dynamic graph algorithms , 2010 .

[35]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[36]  Marco Grzegorczyk,et al.  Improving the structure MCMC sampler for Bayesian networks by introducing a new edge reversal move , 2008, Machine Learning.

[37]  Gregory F. Cooper,et al.  The ALARM Monitoring System: A Case Study with two Probabilistic Inference Techniques for Belief Networks , 1989, AIME.

[38]  L. Wigfall,et al.  HIV testing among midlife women in the deep south: an analysis of the 2008 Behavioral Risk Factor Surveillance System survey data. , 2011, Journal of women's health.

[39]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[40]  J. Besag Spatial Interaction and the Statistical Analysis of Lattice Systems , 1974 .

[41]  G. Roberts,et al.  Updating Schemes, Correlation Structure, Blocking and Parameterization for the Gibbs Sampler , 1997 .

[42]  Peter Bühlmann,et al.  Estimating High-Dimensional Directed Acyclic Graphs with the PC-Algorithm , 2007, J. Mach. Learn. Res..

[43]  Kevin P. Murphy,et al.  Bayesian structure learning using dynamic programming and MCMC , 2007, UAI.

[44]  Robert Castelo,et al.  On Inclusion-Driven Learning of Bayesian Networks , 2003, J. Mach. Learn. Res..