Gaussian process regression bootstrapping: exploring the effects of uncertainty in time course data

Motivation: Although widely accepted that high-throughput biological data are typically highly noisy, the effects that this uncertainty has upon the conclusions we draw from these data are often overlooked. However, in order to assign any degree of confidence to our conclusions, we must quantify these effects. Bootstrap resampling is one method by which this may be achieved. Here, we present a parametric bootstrapping approach for time-course data, in which Gaussian process regression (GPR) is used to fit a probabilistic model from which replicates may then be drawn. This approach implicitly allows the time dependence of the data to be taken into account, and is applicable to a wide range of problems. Results: We apply GPR bootstrapping to two datasets from the literature. In the first example, we show how the approach may be used to investigate the effects of data uncertainty upon the estimation of parameters in an ordinary differential equations (ODE) model of a cell signalling pathway. Although we find that the parameter estimates inferred from the original dataset are relatively robust to data uncertainty, we also identify a distinct second set of estimates. In the second example, we use our method to show that the topology of networks constructed from time-course gene expression data appears to be sensitive to data uncertainty, although there may be individual edges in the network that are robust in light of present data. Availability: Matlab code for performing GPR bootstrapping is available from our web site: http://www3.imperial.ac.uk/theoreticalsystemsbiology/data-software/ Contact: paul.kirk@imperial.ac.uk, m.stumpf@imperial.ac.uk Supplementary information:Supplementary data are available at Bioinformatics online.

[1]  Harald Bergstriim Mathematical Theory of Probability and Statistics , 1966 .

[2]  C. Horvath,et al.  STAT proteins and transcriptional responses to extracellular signals. , 2000, Trends in biochemical sciences.

[3]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[4]  J. Felsenstein CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP , 1985, Evolution; international journal of organic evolution.

[5]  David Thorneycroft,et al.  Diurnal Changes in the Transcriptome Encoding Enzymes of Starch Metabolism Provide Evidence for Both Transcriptional and Posttranscriptional Regulation of Starch Metabolism in Arabidopsis Leaves1 , 2004, Plant Physiology.

[6]  J. Timmer,et al.  Identification of nucleocytoplasmic cycling as a remote sensor in cellular signaling by databased modeling , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Holger Schwender,et al.  Bibliography Reverse Engineering Genetic Networks Using the Genenet Package , 2006 .

[8]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[9]  Roger Woodard,et al.  Interpolation of Spatial Data: Some Theory for Kriging , 1999, Technometrics.

[10]  Korbinian Strimmer,et al.  An empirical Bayes approach to inferring large-scale gene association networks , 2005, Bioinform..

[11]  Ming Yuan,et al.  Flexible temporal expression profile modelling using the Gaussian process , 2006, Comput. Stat. Data Anal..

[12]  Iain Murray Introduction To Gaussian Processes , 2008 .

[13]  Xinglai Ji,et al.  libSRES: a C library for stochastic ranking evolution strategy for parameter estimation , 2006, Bioinform..

[14]  M. Kenward,et al.  An Introduction to the Bootstrap , 2007 .

[15]  Sophie Lèbre,et al.  Statistical Applications in Genetics and Molecular Biology Inferring Dynamic Genetic Networks with Low Order Independencies Inferring Dynamic Genetic Networks with Low Order Independencies ∗ , 2009 .

[16]  Ian Stark,et al.  The Continuous pi-Calculus: A Process Algebra for Biochemical Modelling , 2008, CMSB.

[17]  A. Butte,et al.  Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[18]  M. Barenco,et al.  Ranked prediction of p53 targets using hidden variable dynamic modeling , 2006, Genome Biology.

[19]  Korbinian Strimmer,et al.  From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data , 2007, BMC Systems Biology.

[20]  Christopher K. I. Williams,et al.  Gaussian Processes for Machine Learning (Adaptive Computation and Machine Learning) , 2005 .

[21]  I. H. Öğüş,et al.  NATO ASI Series , 1997 .

[22]  Fernando A. Quintana,et al.  Nonparametric Bayesian data analysis , 2004 .

[23]  Neil D. Lawrence,et al.  Modelling transcriptional regulation using Gaussian Processes , 2006, NIPS.

[24]  Christopher M. Bishop,et al.  Neural networks and machine learning , 1998 .

[25]  D. Aaronson,et al.  A Road Map for Those Who Don't Know JAK-STAT , 2002, Science.

[26]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[27]  Xin Yao,et al.  Stochastic ranking for constrained evolutionary optimization , 2000, IEEE Trans. Evol. Comput..

[28]  M K Kerr,et al.  Bootstrapping cluster analysis: Assessing the reliability of conclusions from microarray experiments , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[29]  Satoru Miyano,et al.  Residual Bootstrapping and Median Filtering for Robust Estimation of Gene Networks from Microarray Data , 2004, CMSB.

[30]  Neil D. Lawrence,et al.  Gaussian process modelling of latent chemical species: applications to inferring transcription factor activities , 2008, ECCB.

[31]  J. E. Glynn,et al.  Numerical Recipes: The Art of Scientific Computing , 1989 .