Improving Structure MCMC for Bayesian Networks through Markov Blanket Resampling

Algorithms for inferring the structure of Bayesian networks from data have become an increasingly popular method for uncovering the direct and indirect influences among variables in complex systems. A Bayesian approach to structure learning uses posterior probabilities to quantify the strength with which the data and prior knowledge jointly support each possible graph feature. Existing Markov Chain Monte Carlo (MCMC) algorithms for estimating these posterior probabilities are slow in mixing and convergence, especially for large networks. We present a novel Markov blanket resampling (MBR) scheme that intermittently reconstructs the Markov blanket of nodes, thus allowing the sampler to more effectively traverse low-probability regions between local maxima. As we can derive the complementary forward and backward directions of the MBR proposal distribution, the Metropolis-Hastings algorithm can be used to account for any asymmetries in these proposals. Experiments across a range of network sizes show that the MBR scheme outperforms other state-of-the-art algorithms, both in terms of learning performance and convergence rate. In particular, MBR achieves better learning performance than the other algorithms when the number of observations is relatively small and faster convergence when the number of variables in the network is large.

[1]  Marco Grzegorczyk,et al.  Improving the structure MCMC sampler for Bayesian networks by introducing a new edge reversal move , 2008, Machine Learning.

[2]  Richard E. Neapolitan,et al.  Learning Bayesian networks , 2007, KDD '07.

[3]  Gregory F. Cooper,et al.  The ALARM Monitoring System: A Case Study with two Probabilistic Inference Techniques for Belief Networks , 1989, AIME.

[4]  Mark E. Borsuk,et al.  Using Bayesian networks to discover relations between genes, environment, and disease , 2013, BioData Mining.

[5]  W. Wong,et al.  Learning Causal Bayesian Network Structures From Experimental Data , 2008 .

[6]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[7]  Mikko Koivisto,et al.  Annealed Importance Sampling for Structure Learning in Bayesian Networks , 2013, IJCAI.

[8]  Michele Pinelli,et al.  Simulating gene-gene and gene-environment interactions in complex diseases: Gene-Environment iNteraction Simulator 2 , 2012, BMC Bioinformatics.

[9]  Mikko Koivisto,et al.  Exact Bayesian Structure Discovery in Bayesian Networks , 2004, J. Mach. Learn. Res..

[10]  A. H. Murphy,et al.  Hailfinder: A Bayesian system for forecasting severe weather , 1996 .

[11]  Andrés R. Masegosa,et al.  New skeleton-based approaches for Bayesian structure learning of Bayesian networks , 2013, Appl. Soft Comput..

[12]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[13]  Qiang Shen,et al.  Learning Bayesian networks: approaches and issues , 2011, The Knowledge Engineering Review.

[14]  Mark E. Borsuk,et al.  Incorporating prior expert knowledge in learning Bayesian networks from genetic epidemiological data , 2014, 2014 IEEE Conference on Computational Intelligence in Bioinformatics and Computational Biology.

[15]  Satoru Miyano,et al.  Combining microarrays and biological knowledge for estimating gene networks via Bayesian networks , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[16]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.

[17]  J. York,et al.  Bayesian Graphical Models for Discrete Data , 1995 .

[18]  Kevin P. Murphy,et al.  Bayesian structure learning using dynamic programming and MCMC , 2007, UAI.

[19]  Mikko Koivisto,et al.  Partial Order MCMC for Structure Discovery in Bayesian Networks , 2011, UAI.

[20]  Paolo Giudici,et al.  Improving Markov Chain Monte Carlo Model Search for Data Mining , 2004, Machine Learning.

[21]  Xujing Wang,et al.  Quantitative utilization of prior biological knowledge in the Bayesian network modeling of gene expression data , 2011, BMC Bioinformatics.

[22]  Marek J. Druzdzel,et al.  A Probabilistic Causal Model for Diagnosis of Liver Disorders , 2005 .