Statistical inference of regulatory networks for circadian regulation

Abstract We assess the accuracy of various state-of-the-art statistics and machine learning methods for reconstructing gene and protein regulatory networks in the context of circadian regulation. Our study draws on the increasing availability of gene expression and protein concentration time series for key circadian clock components in Arabidopsis thaliana. In addition, gene expression and protein concentration time series are simulated from a recently published regulatory network of the circadian clock in A. thaliana, in which protein and gene interactions are described by a Markov jump process based on Michaelis-Menten kinetics. We closely follow recent experimental protocols, including the entrainment of seedlings to different light-dark cycles and the knock-out of various key regulatory genes. Our study provides relative network reconstruction accuracy scores for a critical comparative performance evaluation, and sheds light on a series of highly relevant questions: it quantifies the influence of systematically missing values related to unknown protein concentrations and mRNA transcription rates, it investigates the dependence of the performance on the network topology and the degree of recurrency, it provides deeper insight into when and why non-linear methods fail to outperform linear ones, it offers improved guidelines on parameter settings in different inference procedures, and it suggests new hypotheses about the structure of the central circadian gene regulatory network in A. thaliana.

[1]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[2]  David Heckerman,et al.  Learning Gaussian Networks , 1994, UAI.

[3]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[4]  Geoffrey E. Hinton,et al.  Evaluation of Gaussian processes and other methods for non-linear regression , 1997 .

[5]  Christophe Andrieu,et al.  Joint Bayesian model selection and estimation of noisy sinusoids via reversible jump MCMC , 1999, IEEE Trans. Signal Process..

[6]  I S Kohane,et al.  Mutual information relevance networks: functional genomic clustering using pairwise entropy measurements. , 1999, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[7]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[8]  Korbinian Strimmer,et al.  From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data , 2007, BMC Systems Biology.

[9]  Gianluca Bontempi,et al.  minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information , 2008, BMC Bioinformatics.

[10]  Harri Lähdesmäki,et al.  Learning gene regulatory networks from gene expression measurements using non-parametric molecular kinetics , 2009, Bioinform..

[11]  Amr Ahmed,et al.  Recovering time-varying networks of dependencies in social and biological studies , 2009, Proceedings of the National Academy of Sciences.

[12]  Takeshi Mizuno,et al.  Data assimilation constrains new connections and components in a complex, eukaryotic circadian clock model , 2010, Molecular Systems Biology.

[13]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[14]  Michael P. H. Stumpf,et al.  Statistical inference of the time-varying structure of gene-regulation networks , 2010, BMC Systems Biology.

[15]  E. R. Morrissey,et al.  Inferring the time-invariant topology of a nonlinear sparse gene regulatory network using fully Bayesian spline autoregression. , 2011, Biostatistics.

[16]  Marco Grzegorczyk,et al.  Regularization of non-homogeneous dynamic Bayesian networks with global information-coupling based on hierarchical Bayesian models , 2013, Machine Learning.

[17]  Jorge Gonçalves,et al.  EARLY FLOWERING4 Recruitment of EARLY FLOWERING3 in the Nucleus Sustains the Arabidopsis Circadian Clock[W][OA] , 2012, Plant Cell.