Statistical methods in metabolomics.

Metabolomics is the relatively new field in bioinformatics that uses measurements on metabolite abundance as a tool for disease diagnosis and other medical purposes. Although closely related to proteomics, the statistical analysis is potentially simpler since biochemists have significantly more domain knowledge about metabolites. This chapter reviews the challenges that metabolomics poses in the areas of quality control, statistical metrology, and data mining.

[1]  Douglas B. Kell,et al.  Statistical strategies for avoiding false discoveries in metabolomics and related experiments , 2007, Metabolomics.

[2]  Jeffrey S. Morris,et al.  Reproducibility of SELDI-TOF protein patterns in serum: comparing datasets from different experiments , 2004, Bioinform..

[3]  P. Karp,et al.  Computational prediction of human metabolic pathways from the complete human genome , 2004, Genome Biology.

[4]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[5]  R. C. Bose,et al.  Classification and Analysis of Partially Balanced Incomplete Block Designs with Two Associate Classes , 1952 .

[6]  Paolo Toth,et al.  Knapsack Problems: Algorithms and Computer Implementations , 1990 .

[7]  Di Wu,et al.  Ab-origin: an enhanced tool to identify the sourcing gene segments in germline for rearranged antibodies , 2008, BMC Bioinformatics.

[8]  G. V. Kass,et al.  AUTOMATIC INTERACTION DETECTION , 1982 .

[9]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[10]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[11]  D. Wishart Metabolomics: applications to food science and nutrition research , 2008 .

[12]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[13]  E. Wegman Hyperdimensional Data Analysis Using Parallel Coordinates , 1990 .

[14]  S. T. Buckland,et al.  An Introduction to the Bootstrap. , 1994 .

[15]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[16]  Li Liu,et al.  Robust singular value decomposition analysis of microarray data , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[17]  M. Orešič,et al.  Data processing for mass spectrometry-based metabolomics. , 2007, Journal of chromatography. A.

[18]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[19]  S. Rozen,et al.  Metabolomic analysis and signatures in motor neuron disease , 2005, Metabolomics.

[20]  A G Steele,et al.  Data pooling and key comparison reference values , 2002 .

[21]  Nouna Kettaneh,et al.  Statistical Modeling by Wavelets , 1999, Technometrics.

[22]  A. Ivakhnenko Heuristic self-organization in problems of engineering cybernetics , 1970 .

[23]  Xiaodong Lin,et al.  Learning a complex metabolomic dataset using random forests and support vector machines , 2004, KDD.

[24]  Roberto Romero,et al.  Metabolomics in premature labor: a novel approach to identify patients at risk for preterm delivery , 2010, The journal of maternal-fetal & neonatal medicine : the official journal of the European Association of Perinatal Medicine, the Federation of Asia and Oceania Perinatal Societies, the International Society of Perinatal Obstetricians.

[25]  Regina Y. Liu Control Charts for Multivariate Processes , 1995 .

[26]  Paul S. Bradley,et al.  Feature Selection via Concave Minimization and Support Vector Machines , 1998, ICML.

[27]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[28]  David M. Rocke,et al.  Baseline Correction for NMR Spectroscopic Metabolomics Data Analysis , 2008, BMC Bioinformatics.

[29]  David I. Ellis,et al.  Metabolomics: Current analytical platforms and methodologies , 2005 .

[30]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[31]  John Kinney,et al.  Comparative Study of Machine-Learning and Chemometric Tools for Analysis of In-Vivo High-Throughput Screening Data , 2008, J. Chem. Inf. Model..

[32]  D. F. Morrison,et al.  Multivariate Statistical Methods , 1968 .

[33]  T. Gasser,et al.  Alignment of curves by dynamic time warping , 1997 .

[34]  Amit Mitra,et al.  Statistical Quality Control , 2002, Technometrics.

[35]  Terence Tao,et al.  The Dantzig selector: Statistical estimation when P is much larger than n , 2005, math/0506081.

[36]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[37]  Douglas B. Kell,et al.  Novel biomarkers for pre-eclampsia detected using metabolomics and machine learning , 2005, Metabolomics.

[38]  S. Raudys,et al.  Results in statistical discriminant analysis: a review of the former Soviet union literature , 2004 .

[39]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .

[40]  T. Hassard,et al.  Applied Linear Regression , 2005 .

[41]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[42]  M. Stone Asymptotics for and against cross-validation , 1977 .

[43]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[44]  J. H. Wilkinson,et al.  Error analysis , 2003 .