Interval-Based Chemometric Methods in NMR Foodomics

In classical empirical research a model requires that the number of variables must be less than the number of observations, but developments in chemometrics and modern analytical platforms have pushed people beyond the classical model. Typical "omics" data sets will include 100-1000 samples and often more than 10,000 variables and the advantage of using chemometrics to large data structures is the ability to efficiently deal with collinear data sets with many more variables than samples. However, the trend with ever more variables also pushes the chemometric tools to the limit as they will also increase the extent of spurious correlations and interferences. This chapter advocates for a systematic breakdown of the variable space in intervals in order to improve the interpretability and performance of chemometric methods. The term ". i-chemometrics" is here introduced to encompass the whole class of interval-based chemometric methods. This chapter will describe the advantages of using the generic i-chemometric methods for data preprocessing, data exploration, regression, and sample classification/discrimination using examples from NMR foodomics. The main advantages are more parsimonious models, improved interpretability and, in many cases, improved performance

[1]  G. Picone,et al.  Unsupervised principal component analysis of NMR metabolic profiles for the assessment of substantial equivalence of transgenic grapes (Vitis vinifera). , 2011, Journal of agricultural and food chemistry.

[2]  J. Lindon,et al.  Scaling and normalization effects in NMR spectroscopic metabonomic data sets. , 2006, Analytical chemistry.

[3]  T. Næs,et al.  Path modelling by sequential PLS regression , 2011 .

[4]  S. Engelsen,et al.  Interval Partial Least-Squares Regression (iPLS): A Comparative Chemometric Study with an Example from Near-Infrared Spectroscopy , 2000 .

[5]  J. Carstensen,et al.  Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping , 1998 .

[6]  A. Astrup,et al.  High throughput prediction of chylomicron triglycerides in human plasma by nuclear magnetic resonance and chemometrics , 2010, Nutrition & metabolism.

[7]  R. Powers NMR metabolomics and drug discovery , 2009, Magnetic resonance in chemistry : MRC.

[8]  M. Walsh,et al.  METABOLOMICS IN HUMAN NUTRITION: OPPORTUNITIES AND CHALLENGES , 2005 .

[9]  A. Cifuentes Food analysis and foodomics. , 2009, Journal of chromatography. A.

[10]  F Savorani,et al.  icoshift: A versatile tool for the rapid alignment of 1D NMR spectra. , 2010, Journal of magnetic resonance.

[11]  H. Cartwright,et al.  Application of fast Fourier transform cross-correlation for the alignment of large chromatographic and spectral datasets. , 2005, Analytical chemistry.

[12]  S. Engelsen,et al.  Magnetic resonance in food science : the multivariate challenge , 2005 .

[13]  D. B. Hibbert Genetic algorithms in chemistry , 1993 .

[14]  Age K. Smilde,et al.  Discriminant Q2 (DQ2) for improved discrimination in PLSDA models , 2008, Metabolomics.

[15]  Francesco Savorani,et al.  Assessment of dietary exposure related to dietary GI and fibre intake in a nutritional metabolomic study of human urine , 2011, Genes & Nutrition.

[16]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[17]  Frans van den Berg,et al.  An exploratory chemometric study of 1H NMR spectra of table wines , 2006 .

[18]  Concepción Sánchez-Moreno,et al.  An exploratory NMR nutri-metabonomic investigation reveals dimethyl sulfone as a dietary biomarker for onion intake. , 2009, The Analyst.

[19]  S. Wold,et al.  The multivariate calibration problem in chemistry solved by the PLS method , 1983 .

[20]  Tormod Næs,et al.  Preference mapping by PO-PLS: Separating common and unique information in several data blocks , 2012 .

[21]  Hein Putter,et al.  The bootstrap: a tutorial , 2000 .

[22]  Svante Wold,et al.  Clustering of aryl carbon-13 nuclear magnetic resonance substituent chemical shifts. A multivariate data analysis using principal components , 1983 .

[23]  G. Tomasi,et al.  Warping: Investigation of NMR Pre-processing and Correction , 2005 .

[24]  Frans van den Berg,et al.  Correlation optimized warping and dynamic time warping as preprocessing methods for chromatographic data , 2004 .

[25]  Bahram Hemmateenejad,et al.  Discrimination of edible oils and fats by combination of multivariate pattern recognition and FT-IR spectroscopy: a comparative study between different modeling methods. , 2013, Spectrochimica acta. Part A, Molecular and biomolecular spectroscopy.

[26]  J. Macgregor,et al.  Analysis of multiblock and hierarchical PCA and PLS models , 1998 .

[27]  G. Foca,et al.  Adulteration of the anthocyanin content of red wines: perspectives for authentication by Fourier transform-near infrared and 1H NMR spectroscopies. , 2011, Analytica chimica acta.

[28]  T. Ebbels,et al.  Recursive segment-wise peak alignment of biological (1)h NMR spectra for improved metabolic biomarker recovery. , 2009, Analytical chemistry.

[29]  S. Engelsen,et al.  Metabolic profiling and aquaculture differentiation of gilthead sea bream by 1H NMR metabonomics. , 2010 .

[30]  R. Bro,et al.  Fluorescence spectroscopy and chemometrics for classification of breast cancer samples—a feasibility study using extended canonical variates analysis , 2007 .

[31]  S. de Jong,et al.  A framework for sequential multiblock component methods , 2003 .

[32]  S. Wold,et al.  Partial least squares analysis with cross‐validation for the two‐class problem: A Monte Carlo study , 1987 .

[33]  T. Ebbels,et al.  Metabolic profiling, metabolomic and metabonomic procedures for NMR spectroscopy of urine, plasma, serum and tissue extracts , 2007, Nature Protocols.

[34]  Alejandro,et al.  Foodomics (Advanced Mass Spectrometry in Modern Food Science and Nutrition) || Proteomic-based Techniques for the Characterization of Food Allergens , 2013 .

[35]  P. Eilers Parametric time warping. , 2004, Analytical chemistry.

[36]  Morten Arendt Rasmussen,et al.  A primer to nutritional metabolomics by NMR spectroscopy and chemometrics , 2013 .

[37]  Francesco Savorani,et al.  NMR and interval PLS as reliable methods for determination of cholesterol in rodent lipoprotein fractions , 2010, Metabolomics.

[38]  John C Lindon,et al.  Robust algorithms for automated chemical shift calibration of 1D 1H NMR spectra of blood serum. , 2008, Analytical chemistry.

[39]  Thomas Skov,et al.  Chemometrics, Mass Spectrometry, and Foodomics , 2013 .

[40]  R. H. Jellema,et al.  2.06 – Variable Shift and Alignment , 2009 .

[41]  Age K. Smilde,et al.  UvA-DARE ( Digital Academic Repository ) Assessment of PLSDA cross validation , 2008 .

[42]  Francesco Savorani,et al.  icoshift: An effective tool for the alignment of chromatographic data. , 2011, Journal of chromatography. A.

[43]  Rasmus Bro,et al.  A modification of canonical variates analysis to handle highly collinear multivariate data , 2006 .

[44]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[45]  A. M. Gil,et al.  High-resolution nuclear magnetic resonance spectroscopy and multivariate analysis for the characterization of beer. , 2002, Journal of agricultural and food chemistry.

[46]  S. Engelsen,et al.  Metabolomics as a Powerful Tool for Molecular Quality Assessment of the Fish Sparus aurata , 2011, Nutrients.

[47]  Francesco Savorani,et al.  Investigations of La Rioja terroir for wine production using 1H NMR metabolomics. , 2012, Journal of agricultural and food chemistry.

[48]  E Holmes,et al.  Automatic reduction of NMR spectroscopic data for statistical and pattern recognition classification of samples. , 1994, Journal of pharmaceutical and biomedical analysis.

[49]  R. Leardi,et al.  Sequential application of backward interval partial least squares and genetic algorithms for the selection of relevant spectral regions , 2004 .

[50]  J. Lindon,et al.  'Metabonomics': understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. , 1999, Xenobiotica; the fate of foreign compounds in biological systems.