Non-homogeneous dynamic Bayesian networks for continuous data

Classical dynamic Bayesian networks (DBNs) are based on the homogeneous Markov assumption and cannot deal with non-homogeneous temporal processes. Various approaches to relax the homogeneity assumption have recently been proposed. The present paper presents a combination of a Bayesian network with conditional probabilities in the linear Gaussian family, and a Bayesian multiple changepoint process, where the number and location of the changepoints are sampled from the posterior distribution with MCMC. Our work improves four aspects of an earlier conference paper: it contains a comprehensive and self-contained exposition of the methodology; it discusses the problem of spurious feedback loops in network reconstruction; it contains a comprehensive comparative evaluation of the network reconstruction accuracy on a set of synthetic and real-world benchmark problems, based on a novel discrete changepoint process; and it suggests new and improved MCMC schemes for sampling both the network structures and the changepoint configurations from the posterior distribution. The latter study compares RJMCMC, based on changepoint birth and death moves, with two dynamic programming schemes that were originally devised for Bayesian mixture models. We demonstrate the modifications that have to be made to allow for changing network structures, and the critical impact that the prior distribution on changepoint configurations has on the overall computational complexity.

[1]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[2]  Nir Friedman,et al.  Being Bayesian about Network Structure , 2000, UAI.

[3]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[4]  Jing Yu,et al.  Computational Inference of Neural Information Flow Networks , 2006, PLoS Comput. Biol..

[5]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[6]  Paul Fearnhead,et al.  Exact and efficient Bayesian inference for multiple changepoint problems , 2006, Stat. Comput..

[7]  T. Mizuno,et al.  Genetic linkages of the circadian clock-associated genes, TOC1, CCA1 and LHY, in the photoperiodic control of flowering time in Arabidopsis thaliana. , 2007, Plant & cell physiology.

[8]  Dirk Husmeier,et al.  Gene Regulatory Network Reconstruction by Bayesian Integration of Prior Knowledge and/or Different Experimental Conditions , 2008, J. Bioinform. Comput. Biol..

[9]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[10]  Alexander J. Hartemink,et al.  Principled computational methods for the validation discovery of genetic regulatory networks , 2001 .

[11]  Kevin P. Murphy,et al.  Modeling changing dependency structure in multivariate time series , 2007, ICML '07.

[12]  Amr Ahmed,et al.  Recovering time-varying networks of dependencies in social and biological studies , 2009, Proceedings of the National Academy of Sciences.

[13]  Masayuki Serikawa,et al.  Conserved expression profiles of circadian clock-related genes in two Lemna species showing long-day and short-day photoperiodic flowering responses. , 2006, Plant & cell physiology.

[14]  Steve A. Kay,et al.  Reciprocal Regulation Between TOC1 and LHY/CCA1 Within the Arabidopsis Circadian Clock , 2001, Science.

[15]  Marco Grzegorczyk,et al.  Modelling non-stationary gene regulatory processes with a non-homogeneous Bayesian network and the allocation sampler , 2008, Bioinform..

[16]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[17]  J. York,et al.  Bayesian Graphical Models for Discrete Data , 1995 .

[18]  Connor W. McEntee,et al.  The DIURNAL project: DIURNAL and circadian expression profiling, model-based pattern matching, and promoter analysis. , 2007, Cold Spring Harbor symposia on quantitative biology.

[19]  Elise A. Kikis,et al.  ELF 4 is a phytochrome-regulated component of a negative-feedback loop involving the central oscillator components CCA 1 and LHY , 2005 .

[20]  Acknowledgments , 2006, Molecular and Cellular Endocrinology.

[21]  Andrew Gelman,et al.  General methods for monitoring convergence of iterative simulations , 1998 .

[22]  Agostino Nobile,et al.  Bayesian finite mixtures with an unknown number of components: The allocation sampler , 2007, Stat. Comput..

[23]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[24]  ChengXiang Zhai,et al.  Inference of Gene Pathways Using Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007).

[25]  Andrew Zisserman,et al.  Advances in Neural Information Processing Systems (NIPS) , 2007 .

[26]  Marco Grzegorczyk,et al.  Non-stationary continuous dynamic Bayesian networks , 2009, NIPS.

[27]  Takeshi Mizuno,et al.  Identification of amino acid substitutions that render the Arabidopsis cytokinin receptor histidine kinase AHK4 constitutively active. , 2007, Plant & cell physiology.

[28]  Michael P. H. Stumpf,et al.  Statistical inference of the time-varying structure of gene-regulation networks , 2010, BMC Systems Biology.

[29]  David Heckerman,et al.  Learning Gaussian Networks , 1994, UAI.

[30]  Marco Grzegorczyk,et al.  Modelling non-stationary dynamic gene regulatory processes with the BGM model , 2011, Comput. Stat..

[31]  N. Hengartner,et al.  Structural learning with time‐varying components: tracking the cross‐section of financial time series , 2005 .

[32]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[33]  Ming Zhou,et al.  Regulation of Raf-1 by direct feedback phosphorylation. , 2005, Molecular cell.

[34]  Le Song,et al.  Sparsistent Learning of Varying-coefficient Models with Structural Changes , 2009, NIPS.

[35]  S. Shen-Orr,et al.  Network motifs in the transcriptional regulation network of Escherichia coli , 2002, Nature Genetics.

[36]  Anthony Hall,et al.  FLOWERING LOCUS C Mediates Natural Variation in the High-Temperature Response of the Arabidopsis Circadian Clock[W] , 2006, The Plant Cell Online.

[37]  Kai Wang,et al.  Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks , 2007, ISMB/ECCB.

[38]  Simon Rogers,et al.  A Bayesian regression approach to the inference of regulatory networks from gene expression data , 2005, Bioinform..

[39]  C. Robertson McClung,et al.  Plant Circadian Rhythms , 2006, The Plant Cell Online.

[40]  David Heckerman,et al.  Learning Bayesian Networks: A Unification for Discrete and Gaussian Domains , 1995, UAI.

[41]  Paul E. Brown,et al.  Extension of a genetic network model by iterative experimentation and mathematical analysis , 2005, Molecular systems biology.

[42]  Paolo Giudici,et al.  Improving Markov Chain Monte Carlo Model Search for Data Mining , 2004, Machine Learning.

[43]  Sophie Lèbre Stochastic process analysis for Genomics and Dynamic Bayesian Networks inference. , 2007 .

[44]  Alexander J. Hartemink,et al.  Non-stationary dynamic Bayesian networks , 2008, NIPS.

[45]  Marco Grzegorczyk,et al.  Improving the structure MCMC sampler for Bayesian networks by introducing a new edge reversal move , 2008, Machine Learning.

[46]  Manfred Jaeger,et al.  Proceedings of the 24th Annual International Conference on Machine Learning (ICML 2007) , 2007, ICML 2007.