Hidden Markov induced Dynamic Bayesian Network for recovering time evolving gene regulatory networks

Dynamic Bayesian Networks (DBN) have been widely used to recover gene regulatory relationships from time-series data in computational systems biology. Its standard assumption is ‘stationarity’, and therefore, several research efforts have been recently proposed to relax this restriction. However, those methods suffer from three challenges: long running time, low accuracy and reliance on parameter settings. To address these problems, we propose a novel non-stationary DBN model by extending each hidden node of Hidden Markov Model into a DBN (called HMDBN), which properly handles the underlying time-evolving networks. Correspondingly, an improved structural EM algorithm is proposed to learn the HMDBN. It dramatically reduces searching space, thereby substantially improving computational efficiency. Additionally, we derived a novel generalized Bayesian Information Criterion under the non-stationary assumption (called BWBIC), which can help significantly improve the reconstruction accuracy and largely reduce over-fitting. Moreover, the re-estimation formulas for all parameters of our model are derived, enabling us to avoid reliance on parameter settings. Compared to the state-of-the-art methods, the experimental evaluation of our proposed method on both synthetic and real biological data demonstrates more stably high prediction accuracy and significantly improved computation efficiency, even with no prior knowledge and parameter settings.

[1]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[2]  Dirk Husmeier,et al.  Non-homogeneous dynamic Bayesian networks with Bayesian regularization for inferring gene regulatory networks with gradually time-varying structure , 2012, Machine Learning.

[3]  Larry A. Wasserman,et al.  Time varying undirected graphs , 2008, Machine Learning.

[4]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[5]  Alexander J. Hartemink,et al.  Non-stationary dynamic Bayesian networks , 2008, NIPS.

[6]  Doug Fisher,et al.  Learning from Data: Artificial Intelligence and Statistics V , 1996 .

[7]  Biing-Hwang Juang,et al.  The segmental K-means algorithm for estimating parameters of hidden Markov models , 1990, IEEE Trans. Acoust. Speech Signal Process..

[8]  M. Inés Torres,et al.  Comparative Study of the Baum-Welch and Viterbi Training Algorithms Applied to Read and Spontaneous Speech Recognition , 2003, IbPRIA.

[9]  Michael P. Eichenlaub,et al.  A temporal map of transcription factor activity: mef2 directly regulates target genes at all stages of muscle development. , 2006, Developmental cell.

[10]  Hanh T. Nguyen,et al.  Distinct Posttranscriptional Mechanisms Regulate the Activity of the Zn Finger Transcription Factor Lame duck during Drosophila Myogenesis , 2006, Molecular and Cellular Biology.

[11]  Alexander J. Hartemink,et al.  Learning Non-Stationary Dynamic Bayesian Networks , 2010, J. Mach. Learn. Res..

[12]  Wenjie Fu,et al.  Recovering temporally rewiring networks: a model-based approach , 2007, ICML '07.

[13]  Marco Grzegorczyk,et al.  Improvements in the reconstruction of time-varying gene regulatory networks: dynamic programming and regularization by information sharing among genes , 2011, Bioinform..

[14]  Dirk Husmeier,et al.  Heterogeneous Continuous Dynamic Bayesian Networks with Flexible Structure and Inter-Time Segment Information Sharing , 2010, ICML.

[15]  Michael P. H. Stumpf,et al.  Statistical inference of the time-varying structure of gene-regulation networks , 2010, BMC Systems Biology.

[16]  Amr Ahmed,et al.  Recovering time-varying networks of dependencies in social and biological studies , 2009, Proceedings of the National Academy of Sciences.

[17]  Yan Liu,et al.  Learning Temporal Causal Graphs for Relational Time-Series Analysis , 2010, ICML.

[18]  R. Milo,et al.  Network motifs in integrated cellular networks of transcription-regulation and protein-protein interaction. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[19]  Edward R. Dougherty,et al.  Inferring gene regulatory networks from time series data using the minimum description length principle , 2006, Bioinform..

[20]  David Sims,et al.  FLIGHT: database and tools for the integration and cross-correlation of large-scale RNAi phenotypic datasets , 2005, Nucleic Acids Res..

[21]  Marco Grzegorczyk,et al.  Regularization of non-homogeneous dynamic Bayesian networks with global information-coupling based on hierarchical Bayesian models , 2013, Machine Learning.

[22]  B. S. Baker,et al.  Gene Expression During the Life Cycle of Drosophila melanogaster , 2002, Science.

[23]  Le Song,et al.  KELLER: estimating time-varying interactions between genes , 2009, Bioinform..

[24]  Marco Grzegorczyk,et al.  Non-stationary continuous dynamic Bayesian networks , 2009, NIPS.

[25]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[26]  B. Belkhouche,et al.  Acknowledgements We Would like to Thank , 1993 .

[27]  Farshad Fotouhi,et al.  A database and tool, IM Browser, for exploring and integrating emerging gene and protein interaction data for Drosophila , 2006, BMC Bioinformatics.

[28]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[29]  Kai Wang,et al.  Characterizing Dynamic Changes in the Human Blood Transcriptional Network , 2010, PLoS Comput. Biol..

[30]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[31]  Dirk Husmeier,et al.  Inter-time segment information sharing for non-homogeneous dynamic Bayesian networks , 2010, NIPS.

[32]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[33]  R. Schulz,et al.  The myogenic regulatory gene Mef2 is a direct target for transcriptional activation by Twist during Drosophila myogenesis. , 1998, Genes & development.

[34]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[35]  Sophie Lèbre Stochastic process analysis for Genomics and Dynamic Bayesian Networks inference. , 2007 .

[36]  David Maxwell Chickering,et al.  Learning Bayesian Networks is NP-Complete , 2016, AISTATS.

[37]  Steven M. Gallo,et al.  REDfly 2.0: an integrated database of cis-regulatory modules and transcription factor binding sites in Drosophila , 2007, Nucleic Acids Res..

[38]  Thomas S. Huang,et al.  Time Varying Dynamic Bayesian Network for Nonstationary Events Modeling and Online Inference , 2011, IEEE Transactions on Signal Processing.

[39]  Nir Friedman,et al.  The Bayesian Structural EM Algorithm , 1998, UAI.

[40]  L. Baum,et al.  Statistical Inference for Probabilistic Functions of Finite State Markov Chains , 1966 .

[41]  N. Hengartner,et al.  Structural learning with time‐varying components: tracking the cross‐section of financial time series , 2005 .

[42]  Kevin P. Murphy,et al.  Modeling changing dependency structure in multivariate time series , 2007, ICML '07.

[43]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .