Overview and Evaluation of Recent Methods for Statistical Inference of Gene Regulatory Networks from Time Series Data.

A challenging problem in systems biology is the reconstruction of gene regulatory networks from postgenomic data. A variety of reverse engineering methods from machine learning and computational statistics have been proposed in the literature. However, deciding on the best method to adopt for a particular application or data set might be a confusing task. The present chapter provides a broad overview of state-of-the-art methods with an emphasis on conceptual understanding rather than a deluge of mathematical details, and the pros and cons of the various approaches are discussed. Guidance on practical applications with pointers to publicly available software implementations are included. The chapter concludes with a comprehensive comparative benchmark study on simulated data and a real-work application taken from the current plant systems biology.

[1]  ChengXiang Zhai,et al.  Inference of gene pathways using mixture Bayesian networks , 2009, BMC Systems Biology.

[2]  Andrew J. Millar,et al.  Modelling the widespread effects of TOC1 signalling on the plant circadian clock and its outputs , 2013, BMC Systems Biology.

[3]  M. Ptashne,et al.  Genes and Signals , 2001 .

[4]  Bartek Wilczynski,et al.  Active enhancer positions can be accurately predicted from chromatin marks and collective sequence motif data , 2013, BMC Systems Biology.

[5]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[6]  D. Gillespie Exact Stochastic Simulation of Coupled Chemical Reactions , 1977 .

[7]  A. Millar,et al.  The clock gene circuit in Arabidopsis includes a repressilator with additional feedback loops , 2012, Molecular systems biology.

[8]  Michael E. Tipping Sparse Bayesian Learning and the Relevance Vector Machine , 2001, J. Mach. Learn. Res..

[9]  Geoffrey E. Hinton,et al.  Evaluation of Gaussian processes and other methods for non-linear regression , 1997 .

[10]  Harri Lähdesmäki,et al.  Learning gene regulatory networks from gene expression measurements using non-parametric molecular kinetics , 2009, Bioinform..

[11]  Paul E. Brown,et al.  Extension of a genetic network model by iterative experimentation and mathematical analysis , 2005, Molecular systems biology.

[12]  Simon Rogers,et al.  A Bayesian regression approach to the inference of regulatory networks from gene expression data , 2005, Bioinform..

[13]  Anthony Hall,et al.  Disruption of Hepatic Leptin Signaling Protects Mice From Age- and Diet-Related Glucose Intolerance , 2010, Diabetes.

[14]  Takeshi Mizuno,et al.  Data assimilation constrains new connections and components in a complex, eukaryotic circadian clock model , 2010, Molecular Systems Biology.

[15]  Joe W. Gray,et al.  Causal network inference using biochemical kinetics , 2014, Bioinform..

[16]  ChengXiang Zhai,et al.  Inference of Gene Pathways Using Gaussian Mixture Models , 2007, 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007).

[17]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[18]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[19]  Michele Ceccarelli,et al.  TimeDelay-ARACNE: Reverse engineering of gene networks from time-course data by an information theoretic approach , 2010, BMC Bioinformatics.

[20]  David Heckerman,et al.  Learning Gaussian Networks , 1994, UAI.

[21]  Neil D. Lawrence,et al.  Learning and Inference in Computational Systems Biology , 2010, Computational molecular biology.

[22]  Andrew Gelman,et al.  General methods for monitoring convergence of iterative simulations , 1998 .

[23]  David Higdon,et al.  Gaussian Process Modeling of Derivative Curves , 2013, Technometrics.

[24]  M. Grzegorczyk,et al.  Statistical inference of regulatory networks for circadian regulation , 2014, Statistical applications in genetics and molecular biology.

[25]  Marco Grzegorczyk,et al.  A Non-Homogeneous Dynamic Bayesian Network with Sequentially Coupled Interaction Parameters for Applications in Systems and Synthetic Biology , 2012, Statistical applications in genetics and molecular biology.

[26]  Jean-Michel Marin,et al.  Bayesian Core: A Practical Approach to Computational Bayesian Statistics , 2010 .

[27]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[28]  Amr Ahmed,et al.  Recovering time-varying networks of dependencies in social and biological studies , 2009, Proceedings of the National Academy of Sciences.

[29]  R. Kohn,et al.  Nonparametric regression using Bayesian variable selection , 1996 .

[30]  M. Stitt,et al.  Defining the robust behaviour of the plant clock gene circuit with absolute RNA timeseries and open infrastructure , 2015, Open Biology.

[31]  D. Wilkinson Stochastic modelling for quantitative description of heterogeneous biological systems , 2009, Nature Reviews Genetics.

[32]  Hu Fu,et al.  Prediction of Kinase-Specific Phosphorylation Sites by One-Class SVMs , 2007, 2007 IEEE International Conference on Bioinformatics and Biomedicine (BIBM 2007).

[33]  Ian T. Nabney,et al.  Netlab: Algorithms for Pattern Recognition , 2002 .

[34]  Ralph Neuneier,et al.  Estimation of Conditional Densities: A Comparison of Neural Network Approaches , 1994 .

[35]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[36]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[37]  Michael E. Tipping,et al.  Fast Marginal Likelihood Maximisation for Sparse Bayesian Models , 2003 .

[38]  Dirk Husmeier,et al.  Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks , 2003, Bioinform..

[39]  S. Chib,et al.  Marginal Likelihood From the Metropolis–Hastings Output , 2001 .

[40]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[41]  Guido Sanguinetti,et al.  A Bayesian approach for structure learning in oscillating regulatory networks , 2015, Bioinform..

[42]  Korbinian Strimmer,et al.  From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data , 2007, BMC Systems Biology.

[43]  J. Hillston,et al.  Stochastic properties of the plant circadian clock , 2012, Journal of The Royal Society Interface.

[44]  José G. Ramcrez Data Analysis: Statistical and Computational Methods for Scientists and Engineers , 2000, Technometrics.

[45]  Paul E. Brown,et al.  Quantitative analysis of regulatory flexibility under changing environmental conditions , 2010, Molecular systems biology.

[46]  Marco Grzegorczyk,et al.  Approximate Bayesian inference in semi-mechanistic models , 2016, Statistics and Computing.

[47]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[48]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[49]  D. Rubin,et al.  Inference from Iterative Simulation Using Multiple Sequences , 1992 .

[50]  Carl Troein,et al.  Rethinking Transcriptional Activation in the Arabidopsis Circadian Clock , 2014, PLoS Comput. Biol..

[51]  M. Grzegorczyk,et al.  Inferring bi-directional interactions between circadian clock genes and metabolism with model ensembles , 2015, Statistical applications in genetics and molecular biology.

[52]  Jane Hillston,et al.  Bio-PEPA: A framework for the modelling and analysis of biological systems , 2009, Theor. Comput. Sci..

[53]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[54]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[55]  E. R. Morrissey,et al.  Inferring the time-invariant topology of a nonlinear sparse gene regulatory network using fully Bayesian spline autoregression. , 2011, Biostatistics.

[56]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[57]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[58]  M. Barenco,et al.  Ranked prediction of p53 targets using hidden variable dynamic modeling , 2006, Genome Biology.

[59]  Zoubin Ghahramani,et al.  A Bayesian approach to reconstructing genetic regulatory networks with hidden factors , 2005, Bioinform..