Structure Discovery in Bayesian Networks by Sampling Partial Orders

We present methods based on Metropolis-coupled Markov chain Monte Carlo (MC3) and annealed importance sampling (AIS) for estimating the posterior distribution of Bayesian networks. The methods draw samples from an appropriate distribution of partial orders on the nodes, continued by sampling directed acyclic graphs (DAGs) conditionally on the sampled partial orders. We show that the computations needed for the sampling algorithms are feasible as long as the encountered partial orders have relatively few down-sets. While the algorithms assume suitable modularity properties of the priors, arbitrary priors can be handled by dividing the importance weight of each sampled DAG by the number of topological sorts it has|we give a practical dynamic programming algorithm to compute these numbers. Our empirical results demonstrate that the presented partial-order-based samplers are superior to previous Markov chain Monte Carlo methods, which sample DAGs either directly or via linear orders on the nodes. The results also suggest that the convergence rate of the estimators based on AIS are competitive to those of MC3. Thus AIS is the preferred method, as it enables easier large-scale parallelization and, in addition, supplies good probabilistic lower bound guarantees for the marginal likelihood of the model.

[1]  Martin E. Dyer,et al.  Faster random generation of linear extensions , 1999, SODA '98.

[2]  James Cussens,et al.  Advances in Bayesian Network Learning using Integer Programming , 2013, UAI.

[3]  Satoru Miyano,et al.  Finding Optimal Models for Small Gene Networks , 2003 .

[4]  C. Geyer,et al.  Annealing Markov chain Monte Carlo with applications to ancestral inference , 1995 .

[5]  Lhouari Nourine,et al.  Efficient algorithms on distributive lattices , 2001, Discret. Appl. Math..

[6]  David Maxwell Chickering,et al.  Learning Bayesian networks: The combination of knowledge and statistical data , 1995, Mach. Learn..

[7]  Wray L. Buntine Theory Refinement on Bayesian Networks , 1991, UAI.

[8]  Mikko Koivisto,et al.  Finding optimal Bayesian networks using precedence constraints , 2013, J. Mach. Learn. Res..

[9]  Vibhav Gogate,et al.  Sampling-based lower bounds for counting queries , 2011, Intelligenza Artificiale.

[10]  W. Wong,et al.  Learning Causal Bayesian Network Structures From Experimental Data , 2008 .

[11]  D. Koller,et al.  Automated identification of pathways from quantitative genetic interaction data , 2010, Molecular systems biology.

[12]  Jukka Corander,et al.  Parallell interacting MCMC for learning of topologies of graphical models , 2008, Data Mining and Knowledge Discovery.

[13]  Jörg Hoffmann,et al.  From Sampling to Model Counting , 2007, IJCAI.

[14]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[15]  Mikko Koivisto,et al.  Treedy: A Heuristic for Counting and Sampling Subsets , 2013, UAI.

[16]  Kevin P. Murphy,et al.  Bayesian structure learning using dynamic programming and MCMC , 2007, UAI.

[17]  Mikko Koivisto,et al.  Partial Order MCMC for Structure Discovery in Bayesian Networks , 2011, UAI.

[18]  Changhe Yuan,et al.  Learning Optimal Bayesian Networks: A Shortest Path Perspective , 2013, J. Artif. Intell. Res..

[19]  Shin-Ichi Nakano,et al.  Constant Time Generation of Linear Extensions , 2005, WALCOM.

[20]  Michael D. Vose,et al.  A Linear Algorithm For Generating Random Numbers With a Given Distribution , 1991, IEEE Trans. Software Eng..

[21]  J. York,et al.  Bayesian Graphical Models for Discrete Data , 1995 .

[22]  G. Brightwell,et al.  Counting linear extensions , 1991 .

[23]  A. Asuncion,et al.  UCI Machine Learning Repository, University of California, Irvine, School of Information and Computer Sciences , 2007 .

[24]  Frank Ruskey,et al.  Generating Linear Extensions Fast , 1994, SIAM J. Comput..

[25]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[26]  Alastair J. Walker,et al.  An Efficient Method for Generating Discrete Random Variables with General Distributions , 1977, TOMS.

[27]  Mikko Koivisto,et al.  Bayesian structure discovery in Bayesian networks with less space , 2010, AISTATS.

[28]  Radford M. Neal Annealed importance sampling , 1998, Stat. Comput..

[29]  Mikko Koivisto,et al.  Annealed Importance Sampling for Structure Learning in Bayesian Networks , 2013, IJCAI.

[30]  Jin Tian,et al.  Computing Posterior Probabilities of Structural Features in Bayesian Networks , 2009, UAI.

[31]  Mikko Koivisto,et al.  Exact Bayesian Structure Discovery in Bayesian Networks , 2004, J. Mach. Learn. Res..

[32]  Tomi Silander,et al.  A Simple Approach for Finding the Globally Optimal Bayesian Network Structure , 2006, UAI.

[33]  Nando de Freitas,et al.  An Introduction to MCMC for Machine Learning , 2004, Machine Learning.

[34]  Andrew W. Moore,et al.  Finding optimal Bayesian networks by dynamic programming , 2005 .

[35]  Mikko Koivisto,et al.  A space-time tradeoff for permutation problems , 2010, SODA '10.

[36]  C. Geyer Markov Chain Monte Carlo Maximum Likelihood , 1991 .

[37]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[38]  Marco Grzegorczyk,et al.  Improving the structure MCMC sampler for Bayesian networks by introducing a new edge reversal move , 2008, Machine Learning.

[39]  Mikko Koivisto,et al.  Advances in Exact Bayesian Structure Discovery in Bayesian Networks , 2006, UAI.

[40]  Gregory F. Cooper,et al.  The ALARM Monitoring System: A Case Study with two Probabilistic Inference Techniques for Belief Networks , 1989, AIME.

[41]  Vibhav Gogate,et al.  Studies in Lower Bounding Probabilities of Evidence using the Markov Inequality , 2007, UAI.