Multivariate methods in metabolomics – from pre-processing to dimension reduction and statistical analysis

Abstract This article presents some of the multivariate methods used in metabolomics, and addresses many of the data types and associated analyses of current instrumentation and applications seen from the point of view of data analysis. I cover most of the statistical pipeline – from pre-processing to the final results of statistical analysis (i.e. pre-processing of the data, regression, classification, clustering, validation and related subjects). Most emphasis is on descriptions of the methods, their advantages and weaknesses, and their usefulness in metabolomics. Of course, the selection of methods presented is not an exhaustive, but should shed some light on some of the more popular and relevant.

[1]  Rasmus Bro,et al.  New exploratory clustering tool , 2008 .

[2]  Frans van den Berg,et al.  Correlation optimized warping and dynamic time warping as preprocessing methods for chromatographic data , 2004 .

[3]  R. Tauler Multivariate curve resolution applied to second order data , 1995 .

[4]  Ildiko E. Frank,et al.  DASCO — a new classification method , 1988 .

[5]  T. Næs,et al.  Canonical partial least squares—a unified PLS approach to classification and regression problems , 2009 .

[6]  Kristian Hovde Liland,et al.  Quantitative whole spectrum analysis with MALDI-TOF MS, Part II: Determining the concentration of milk in mixtures , 2009 .

[7]  Kurt Hornik,et al.  The support vector machine under test , 2003, Neurocomputing.

[8]  T. Næs,et al.  From dummy regression to prior probabilities in PLS‐DA , 2007 .

[9]  P. Eilers Parametric time warping. , 2004, Analytical chemistry.

[10]  Anthony C. Davison,et al.  Bootstrap Methods and Their Application , 1998 .

[11]  Sirish L. Shah,et al.  Analysis of metabolomic data using support vector machines. , 2008, Analytical chemistry.

[12]  Kristian Hovde Liland,et al.  Optimal Choice of Baseline Correction for Multivariate Calibration of Spectra , 2010, Applied spectroscopy.

[13]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[14]  Rasmus Bro,et al.  Solving fundamental problems in chromatographic analysis , 2008, Analytical and bioanalytical chemistry.

[15]  Svante Wold,et al.  Pattern recognition by means of disjoint principal components models , 1976, Pattern Recognit..

[16]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[17]  David I. Ellis,et al.  Metabolomics: Current analytical platforms and methodologies , 2005 .

[18]  A. Mahadevan-Jansen,et al.  Automated Method for Subtraction of Fluorescence from Biological Raman Spectra , 2003, Applied spectroscopy.

[19]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Johan Trygg,et al.  Chemometrics in metabonomics. , 2007, Journal of proteome research.

[21]  M. Rantalainen,et al.  OPLS discriminant analysis: combining the strengths of PLS‐DA and SIMCA classification , 2006 .

[22]  Robert S Plumb,et al.  Statistical heterospectroscopy, an approach to the integrated analysis of NMR and UPLC-MS data sets: application in metabonomic toxicology studies. , 2006, Analytical chemistry.

[23]  Timothy M. D. Ebbels,et al.  Bioinformatic methods in NMR-based metabolic profiling , 2009 .

[24]  Bennett Daviss,et al.  Growing pains for metabolomics: the newest 'omic science is producing results--and more data than researchers know what to do with , 2005 .

[25]  Kristian Hovde Liland,et al.  Powered partial least squares discriminant analysis , 2009 .

[26]  U. Edlund,et al.  Visualization of GC/TOF-MS-based metabolomics data for identification of biochemically interesting compounds using OPLS class models. , 2008, Analytical chemistry.

[27]  J. Carstensen,et al.  Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping , 1998 .

[28]  Héctor C. Goicoechea,et al.  The application to wastewaters of chemometric approaches to handling problems of highly complex matrices , 2007 .

[29]  I. Daubechies Orthonormal bases of compactly supported wavelets , 1988 .

[30]  J. Macgregor,et al.  Analysis of multiblock and hierarchical PCA and PLS models , 1998 .

[31]  Øyvind Langsrud,et al.  Rotation tests , 2005, Stat. Comput..

[32]  Robert W. Field,et al.  Baseline subtraction using robust local regression estimation , 2001 .

[33]  S. Wold,et al.  Orthogonal projections to latent structures (O‐PLS) , 2002 .

[34]  Hicham Noçairi,et al.  Discrimination on latent components with respect to patterns. Application to multicollinear data , 2005, Comput. Stat. Data Anal..

[35]  Ulf G. Indahl,et al.  A twist to partial least squares regression , 2005 .

[36]  A. Smilde,et al.  Fusion of mass spectrometry-based metabolomics data. , 2005, Analytical chemistry.

[37]  F Savorani,et al.  icoshift: A versatile tool for the rapid alignment of 1D NMR spectra. , 2010, Journal of magnetic resonance.

[38]  Richard A. Johnson,et al.  Applied Multivariate Statistical Analysis , 1983 .

[39]  R. Bro Multiway calibration. Multilinear PLS , 1996 .

[40]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.