A statistical framework for biomarker discovery in metabolomic time course data

MOTIVATION Metabolomics is the study of the complement of small molecule metabolites in cells, biofluids and tissues. Many metabolomic experiments are designed to compare changes observed over time under two experimental conditions or groups (e.g. a control and drug-treated group) with the goal of identifying discriminatory metabolites or biomarkers that characterize each condition. A common study design consists of repeated measurements taken on each experimental unit thus producing time courses of all metabolites. We describe a statistical framework for estimating time-varying metabolic profiles and their within-group variability and for detecting between-group differences. Specifically, we propose (i) a smoothing splines mixed effects (SME) model that treats each longitudinal measurement as a smooth function of time and (ii) an associated functional test statistic. Statistical significance is assessed by a non-parametric bootstrap procedure. RESULTS The methodology has been extensively evaluated using simulated data and has been applied to real nuclear magnetic resonance spectroscopy data collected in a preclinical toxicology study as part of a larger project lead by the COMET (Consortium for Metabonomic Toxicology). Our findings are compatible with the previously published studies. AVAILABILITY An R script is freely available for download at http://www2.imperial.ac.uk/~gmontana/sme.htm.

[1]  Korbinian Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology , 2005 .

[2]  Joachim Selbig,et al.  A gentle guide to the analysis of metabolomic data. , 2007, Methods in molecular biology.

[3]  Wenxuan Zhong,et al.  A data-driven clustering method for time course gene expression data , 2006, Nucleic acids research.

[4]  John D. Storey,et al.  Significance analysis of time course microarray experiments. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[5]  Douglas B. Kell,et al.  Proposed minimum reporting standards for data analysis in metabolomics , 2007, Metabolomics.

[6]  D. Kell,et al.  A functional genomics strategy that uses metabolome data to reveal the phenotype of silent mutations , 2001, Nature Biotechnology.

[7]  T. Jaakkola,et al.  Comparing the continuous representation of time-series expression profiles to identify differentially expressed genes , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Carl de Boor,et al.  A Practical Guide to Splines , 1978, Applied Mathematical Sciences.

[9]  B. Silverman,et al.  Nonparametric regression and generalized linear models , 1994 .

[10]  Elaine Holmes,et al.  The Consortium for Metabonomic Toxicology (COMET): aims, activities and achievements. , 2005, Pharmacogenomics.

[11]  J. Lindon,et al.  Metabonomics: a platform for studying drug toxicity and gene function , 2002, Nature Reviews Drug Discovery.

[12]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[13]  Clifford M. Hurvich,et al.  Smoothing parameter selection in nonparametric regression using an improved Akaike information criterion , 1998 .

[14]  Ricardo Fraiman,et al.  An anova test for functional data , 2004, Comput. Stat. Data Anal..

[15]  R. Tibshirani,et al.  Significance analysis of microarrays applied to the ionizing radiation response , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[16]  J. Lindon,et al.  'Metabonomics': understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. , 1999, Xenobiotica; the fate of foreign compounds in biological systems.

[17]  Wensheng Guo Functional Mixed Effects Models , 2002 .

[18]  G. Stoyan de Boor, C., A Practical Guide to Splines. Applied Mathematical Sciences 27. Berlin‐Heidelberg‐New York, Springer‐Verlag 1978. XXIV, 392 S., DM 32,50. US $ 17.90 , 1980 .

[19]  James O. Ramsay,et al.  Functional Data Analysis , 2005 .

[20]  Timothy M. D. Ebbels,et al.  Batch statistical processing of 1H NMR‐derived urinary spectral data , 2002 .

[21]  G. Robinson That BLUP is a Good Thing: The Estimation of Random Effects , 1991 .

[22]  E Holmes,et al.  Metabonomic investigations into hydrazine toxicity in the rat. , 2001, Chemical research in toxicology.

[23]  J. Craggs Applied Mathematical Sciences , 1973 .

[24]  Timothy M. D. Ebbels,et al.  Bioinformatic methods in NMR-based metabolic profiling , 2009 .

[25]  Gordon K Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Linear Models and Empirical Bayes Methods for Assessing Differential Expression in Microarray Experiments , 2011 .

[26]  Hongzhe Li,et al.  Clustering of time-course gene expression data using a mixed-effects model with B-splines , 2003, Bioinform..

[27]  Henrik Antti,et al.  Comparative metabonomics of differential hydrazine toxicity in the rat and mouse. , 2005, Toxicology and applied pharmacology.

[28]  Hulin Wu,et al.  Nonparametric regression methods for longitudinal data analysis , 2006 .

[29]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[30]  Dennis D. Cox,et al.  Pointwise testing with functional data using the Westfall–Young randomization method , 2008 .

[31]  A. K. Smilde,et al.  Dynamic metabolomic data analysis: a tutorial review , 2009, Metabolomics.

[32]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .