Inference of regulatory networks with a convergence improved MCMC sampler

BackgroundOne of the goals of the Systems Biology community is to have a detailed map of all biological interactions in an organism. One small yet important step in this direction is the creation of biological networks from post-genomic data. Bayesian networks are a very promising model for the inference of regulatory networks in Systems Biology. Usually, Bayesian networks are sampled with a Markov Chain Monte Carlo (MCMC) sampler in the structure space. Unfortunately, conventional MCMC sampling schemes are often slow in mixing and convergence. To improve MCMC convergence, an alternative method is proposed and tested with different sets of data. Moreover, the proposed method is compared with the traditional MCMC sampling scheme.ResultsIn the proposed method, a simpler and faster method for the inference of regulatory networks, Graphical Gaussian Models (GGMs), is integrated into the Bayesian network inference, trough a Hierarchical Bayesian model. In this manner, information about the structure obtained from the data with GGMs is taken into account in the MCMC scheme, thus improving mixing and convergence. The proposed method is tested with three types of data, two from simulated models and one from real data. The results are compared with the results of the traditional MCMC sampling scheme in terms of network recovery accuracy and convergence. The results show that when compared with a traditional MCMC scheme, the proposed method presents improved convergence leading to better network reconstruction with less MCMC iterations.ConclusionsThe proposed method is a viable alternative to improve mixing and convergence of traditional MCMC schemes. It allows the use of Bayesian networks with an MCMC sampler with less iterations. The proposed method has always converged earlier than the traditional MCMC scheme. We observe an improvement in accuracy of the recovered networks for the Gaussian simulated data, but this improvement is absent for both real data and data simulated from ODE.

[1]  Hidde de Jong,et al.  Modeling and Simulation of Genetic Regulatory Systems: A Literature Review , 2002, J. Comput. Biol..

[2]  Adrian E. Raftery,et al.  Fast Bayesian inference for gene regulatory networks using ScanBMA , 2014, BMC Systems Biology.

[3]  Brian Godsey,et al.  Improved Inference of Gene Regulatory Networks through Integrated Bayesian Clustering and Dynamic Modeling of Time-Course Expression Data , 2013, PloS one.

[4]  Rainer Spang,et al.  Inferring cellular networks – a review , 2007, BMC Bioinformatics.

[5]  David Heckerman,et al.  A Tutorial on Learning with Bayesian Networks , 1999, Innovations in Bayesian Networks.

[6]  Olivier Ledoit,et al.  A well-conditioned estimator for large-dimensional covariance matrices , 2004 .

[7]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[8]  W. K. Hastings,et al.  Monte Carlo Sampling Methods Using Markov Chains and Their Applications , 1970 .

[9]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[10]  D. Husmeier,et al.  Reconstructing Gene Regulatory Networks with Bayesian Networks by Combining Expression Data with Multiple Sources of Prior Knowledge , 2007, Statistical applications in genetics and molecular biology.

[11]  Marco Grzegorczyk,et al.  Comparative evaluation of reverse engineering gene regulatory networks with relevance networks, graphical gaussian models and bayesian networks , 2006, Bioinform..

[12]  Ming Zhou,et al.  Regulation of Raf-1 by direct feedback phosphorylation. , 2005, Molecular cell.

[13]  Jun S. Liu,et al.  Covariance structure of the Gibbs sampler with applications to the comparisons of estimators and augmentation schemes , 1994 .

[14]  Nir Friedman,et al.  Being Bayesian about Network Structure , 2000, UAI.

[15]  Satoru Miyano,et al.  Error tolerant model for incorporating biological knowledge with expression data in estimating gene networks , 2006 .

[16]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[17]  David Maxwell Chickering,et al.  Large-Sample Learning of Bayesian Networks is NP-Hard , 2002, J. Mach. Learn. Res..

[18]  Adriano Velasque Werhli,et al.  Reconstruction of gene regulatory networks from postgenomic data , 2007 .

[19]  Satoru Miyano,et al.  Estimating gene networks from gene expression data by combining Bayesian network model with promoter element detection , 2003, ECCB.

[20]  Marco Grzegorczyk,et al.  Improving the structure MCMC sampler for Bayesian networks by introducing a new edge reversal move , 2008, Machine Learning.

[21]  Tod S. Levitt,et al.  Uncertainty in artificial intelligence , 1988 .

[22]  J. York,et al.  Bayesian Graphical Models for Discrete Data , 1995 .

[23]  Stephen J. Roberts,et al.  Probabilistic Modeling in Bioinformatics and Medical Informatics , 2010 .

[24]  Marco Grzegorczyk,et al.  Improvements in the reconstruction of time-varying gene regulatory networks: dynamic programming and regularization by information sharing among genes , 2011, Bioinform..

[25]  John Geweke,et al.  Evaluating the accuracy of sampling-based approaches to the calculation of posterior moments , 1991 .

[26]  A. Zellner,et al.  Gibbs Sampler Convergence Criteria , 1995 .

[27]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[28]  Patrik D'haeseleer,et al.  Genetic network inference: from co-expression clustering to reverse engineering , 2000, Bioinform..

[29]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Dario Floreano,et al.  GeneNetWeaver: in silico benchmark generation and performance profiling of network inference methods , 2011, Bioinform..

[31]  Mark E. Borsuk,et al.  On Monte Carlo methods for Bayesian inference , 2003 .

[32]  Xiaobo Guo,et al.  Inferring Nonlinear Gene Regulatory Networks from Gene Expression Data Based on Distance Correlation , 2014, PloS one.

[33]  Bradley P. Carlin,et al.  Markov Chain Monte Carlo conver-gence diagnostics: a comparative review , 1996 .

[34]  Peter Müller,et al.  A Bayesian bivariate failure time regression model , 1998 .

[35]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[36]  David Heckerman,et al.  Learning Gaussian Networks , 1994, UAI.

[37]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .