Improvements in the reconstruction of time-varying gene regulatory networks: dynamic programming and regularization by information sharing among genes

METHOD Dynamic Bayesian networks (DBNs) have been applied widely to reconstruct the structure of regulatory processes from time series data, and they have established themselves as a standard modelling tool in computational systems biology. The conventional approach is based on the assumption of a homogeneous Markov chain, and many recent research efforts have focused on relaxing this restriction. An approach that enjoys particular popularity is based on a combination of a DBN with a multiple changepoint process, and the application of a Bayesian inference scheme via reversible jump Markov chain Monte Carlo (RJMCMC). In the present article, we expand this approach in two ways. First, we show that a dynamic programming scheme allows the changepoints to be sampled from the correct conditional distribution, which results in improved convergence over RJMCMC. Second, we introduce a novel Bayesian clustering and information sharing scheme among nodes, which provides a mechanism for automatic model complexity tuning. RESULTS We evaluate the dynamic programming scheme on expression time series for Arabidopsis thaliana genes involved in circadian regulation. In a simulation study we demonstrate that the regularization scheme improves the network reconstruction accuracy over that obtained with recently proposed inhomogeneous DBNs. For gene expression profiles from a synthetically designed Saccharomyces cerevisiae strain under switching carbon metabolism we show that the combination of both: dynamic programming and regularization yields an inference procedure that outperforms two alternative established network reconstruction methods from the biology literature. AVAILABILITY AND IMPLEMENTATION A MATLAB implementation of the algorithm and a supplementary paper with algorithmic details and further results for the Arabidopsis data can be downloaded from: http://www.statistik.tu-dortmund.de/bio2010.html.

[1]  Nir Friedman,et al.  Being Bayesian About Network Structure. A Bayesian Approach to Structure Discovery in Bayesian Networks , 2004, Machine Learning.

[2]  Connor W. McEntee,et al.  The DIURNAL project: DIURNAL and circadian expression profiling, model-based pattern matching, and promoter analysis. , 2007, Cold Spring Harbor symposia on quantitative biology.

[3]  Jinde Cao,et al.  Global robust power-rate stability of delayed genetic regulatory networks with noise perturbations , 2010, Cognitive Neurodynamics.

[4]  Paul E. Brown,et al.  Extension of a genetic network model by iterative experimentation and mathematical analysis , 2005, Molecular systems biology.

[5]  Jinde Cao,et al.  Exponential Stability of Discrete-Time Genetic Regulatory Networks With Delays , 2008, IEEE Transactions on Neural Networks.

[6]  Darren J. Wilkinson Stochastic Modelling for Systems Biology , 2006 .

[7]  C. Robertson McClung,et al.  Plant Circadian Rhythms , 2006, The Plant Cell Online.

[8]  Paolo Giudici,et al.  Improving Markov Chain Monte Carlo Model Search for Data Mining , 2004, Machine Learning.

[9]  Amr Ahmed,et al.  Recovering time-varying networks of dependencies in social and biological studies , 2009, Proceedings of the National Academy of Sciences.

[10]  P. Quail,et al.  ELF4 is a phytochrome-regulated component of a negative-feedback loop involving the central oscillator components CCA1 and LHY. , 2005, The Plant journal : for cell and molecular biology.

[11]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[12]  T. Mizuno,et al.  Genetic linkages of the circadian clock-associated genes, TOC1, CCA1 and LHY, in the photoperiodic control of flowering time in Arabidopsis thaliana. , 2007, Plant & cell physiology.

[13]  P. Green Reversible jump Markov chain Monte Carlo computation and Bayesian model determination , 1995 .

[14]  Anthony Hall,et al.  FLOWERING LOCUS C Mediates Natural Variation in the High-Temperature Response of the Arabidopsis Circadian Clock[W] , 2006, The Plant Cell Online.

[15]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[16]  Kevin P. Murphy,et al.  Modeling changing dependency structure in multivariate time series , 2007, ICML '07.

[17]  Simon Rogers,et al.  A Bayesian regression approach to the inference of regulatory networks from gene expression data , 2005, Bioinform..

[18]  Masayuki Serikawa,et al.  Conserved expression profiles of circadian clock-related genes in two Lemna species showing long-day and short-day photoperiodic flowering responses. , 2006, Plant & cell physiology.

[19]  Steve A. Kay,et al.  Reciprocal Regulation Between TOC1 and LHY/CCA1 Within the Arabidopsis Circadian Clock , 2001, Science.

[20]  Jinde Cao,et al.  Genetic oscillation deduced from Hopf bifurcation in a genetic regulatory network with delays. , 2008, Mathematical biosciences.

[21]  Marco Grzegorczyk,et al.  Modelling non-stationary gene regulatory processes with a non-homogeneous Bayesian network and the allocation sampler , 2008, Bioinform..

[22]  Geoffrey E. Hinton,et al.  Simplifying Neural Networks by Soft Weight-Sharing , 1992, Neural Computation.

[23]  Paul Fearnhead,et al.  Exact and efficient Bayesian inference for multiple changepoint problems , 2006, Stat. Comput..

[24]  Alexander J. Hartemink,et al.  Non-stationary dynamic Bayesian networks , 2008, NIPS.

[25]  D. Bernardo,et al.  A Yeast Synthetic Network for In Vivo Assessment of Reverse-Engineering and Modeling Approaches , 2009, Cell.

[26]  Nir Friedman,et al.  Being Bayesian about Network Structure , 2000, UAI.

[27]  J. York,et al.  Bayesian Graphical Models for Discrete Data , 1995 .

[28]  Marco Grzegorczyk,et al.  Non-stationary continuous dynamic Bayesian networks , 2009, NIPS.

[29]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[30]  Sophie Lèbre Stochastic process analysis for Genomics and Dynamic Bayesian Networks inference. , 2007 .

[31]  Marco Grzegorczyk,et al.  Improving the structure MCMC sampler for Bayesian networks by introducing a new edge reversal move , 2008, Machine Learning.

[32]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[33]  References , 1971 .

[34]  Marco Grzegorczyk,et al.  Non-homogeneous dynamic Bayesian networks for continuous data , 2011, Machine Learning.

[35]  Michael P. H. Stumpf,et al.  Statistical inference of the time-varying structure of gene-regulation networks , 2010, BMC Systems Biology.

[36]  Le Song,et al.  Sparsistent Learning of Varying-coefficient Models with Structural Changes , 2009, NIPS.

[37]  David Heckerman,et al.  Learning Gaussian Networks , 1994, UAI.

[38]  Marco Grzegorczyk,et al.  Modelling non-stationary dynamic gene regulatory processes with the BGM model , 2011, Comput. Stat..

[39]  N. Hengartner,et al.  Structural learning with time‐varying components: tracking the cross‐section of financial time series , 2005 .

[40]  Andrew Gelman,et al.  General methods for monitoring convergence of iterative simulations , 1998 .

[41]  Kai Wang,et al.  Comparative analysis of microarray normalization procedures: effects on reverse engineering gene networks , 2007, ISMB/ECCB.