Chemometric Exploration of Quantitative NMR Data

This article outlines the synergistic relationship between NMR and chemometrics. The latent variable approach used in chemometrics has proven very powerful for performing inductive explorations of biological systems and for its usefulness insolving industrial problems effectively. This article reviews some of the commonest latent variable approaches applied to the exploratory and predictive modeling of NMR data. It describes how challenging NMR data can be adapted for multivariate data analysis and how the different chemometric methods manipulate the NMR data. The different results from unsupervised data exploration by principal component analysis and multivariate curve resolution are illustrated. On the other hand, many modern applications of NMR within metabolomics and quality control are based on supervised regression analysis or classification analysis. This article demonstrates how these basic chemometric methods work and gives examples of how such methods can be optimized by variable reduction and orthogonal factor extraction. Validation methods and classification performance by the receiver operating characteristics are illustrated. Finally, the potential for merging advanced multiway chemometric methods such as parallel factor analysis (PARAFAC) with the ability of NMR to record true high-order data is emphasized, and illustrated by the application to 2D diffusion-edited spectra of human plasma samples. Keywords: multivariate data analysis; latent factor methods; pattern recognition; chemometrics; PCA; PLS; PLS-DA; ROC; PARAFAC

[1]  S. C. Johnson Hierarchical clustering schemes , 1967, Psychometrika.

[2]  J. Trygg,et al.  Evaluation of the orthogonal projection on latent structure model limitations caused by chemical shift variability and improved visualization of biomarker changes in 1H NMR spectroscopic metabonomic studies. , 2005, Analytical chemistry.

[3]  R. Bro,et al.  Quantitative analysis of NMR spectra with chemometrics. , 2008, Journal of magnetic resonance.

[4]  A. Yilmaz,et al.  Metabolic profiling based on two-dimensional J-resolved 1H NMR data and parallel factor analysis. , 2011, Analytical chemistry.

[5]  Rasmus Bro,et al.  Mathematical chromatography solves the cocktail party effect in mixtures using 2D spectra and PARAFAC , 2010 .

[6]  Søren Balling Engelsen,et al.  Quantification of lipoprotein subclasses by proton nuclear magnetic resonance-based partial least-squares regression models. , 2005, Clinical chemistry.

[7]  Rasmus Bro,et al.  Analysis of lipoproteins using 2D diffusion-edited NMR spectroscopy and multi-way chemometrics , 2005 .

[8]  M. Nilsson,et al.  Diffusion NMR and trilinear analysis in the study of reaction kinetics. , 2009, Chemical communications.

[9]  Age K. Smilde,et al.  UvA-DARE ( Digital Academic Repository ) Assessment of PLSDA cross validation , 2008 .

[10]  S. Engelsen,et al.  Interval Partial Least-Squares Regression (iPLS): A Comparative Chemometric Study with an Example from Near-Infrared Spectroscopy , 2000 .

[11]  J. Roger,et al.  EPO–PLS external parameter orthogonalisation of PLS application to temperature-independent measurement of sugar content of intact fruits , 2003 .

[12]  Concepción Sánchez-Moreno,et al.  An exploratory NMR nutri-metabonomic investigation reveals dimethyl sulfone as a dietary biomarker for onion intake. , 2009, The Analyst.

[13]  C. Beddell,et al.  Automatic data reduction and pattern recognition methods for analysis of 1H nuclear magnetic resonance spectra of human urine from normal and pathological states. , 1994, Analytical biochemistry.

[14]  Jan van der Greef,et al.  Symbiosis of chemometrics and metabolomics: past, present, and future , 2005 .

[15]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[16]  S. Engelsen,et al.  Direct quantification of M/G ratio from (13)C CP-MAS NMR spectra of alginate powders by multivariate curve resolution. , 2009, Carbohydrate research.

[17]  S. Wold,et al.  The multivariate calibration problem in chemistry solved by the PLS method , 1983 .

[18]  S. Engelsen,et al.  Metabolic profiling and aquaculture differentiation of gilthead sea bream by 1H NMR metabonomics. , 2010 .

[19]  Age K. Smilde,et al.  ANOVA-simultaneous component analysis (ASCA): a new tool for analyzing designed metabolomics data , 2005, Bioinform..

[20]  Rasmus Bro,et al.  A modification of canonical variates analysis to handle highly collinear multivariate data , 2006 .

[21]  J. Chang,et al.  Analysis of individual differences in multidimensional scaling via an n-way generalization of “Eckart-Young” decomposition , 1970 .

[22]  J. Lindon,et al.  'Metabonomics': understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. , 1999, Xenobiotica; the fate of foreign compounds in biological systems.

[23]  J. Edward Jackson,et al.  Principal Components and Factor Analysis: Part I - Principal Components , 1980 .

[24]  R. Bro,et al.  A fast non‐negativity‐constrained least squares algorithm , 1997 .

[25]  Douglas B. Kell,et al.  Proposed minimum reporting standards for data analysis in metabolomics , 2007, Metabolomics.

[26]  Isao Noda,et al.  Molecular factor analysis applied to collections of NMR spectra. , 2004, Analytical chemistry.

[27]  R. Wehrens,et al.  Real‐life applications of the MULVADO software package for processing DOSY NMR data , 2006, Magnetic resonance in chemistry : MRC.

[28]  Carolina V. Di Anibal,et al.  1H NMR variable selection approaches for classification. A case study: the determination of adulterated foodstuffs. , 2011, Talanta.

[29]  S. Wold Cross-Validatory Estimation of the Number of Components in Factor and Principal Components Models , 1978 .

[30]  T. Ebbels,et al.  Recursive segment-wise peak alignment of biological (1)h NMR spectra for improved metabolic biomarker recovery. , 2009, Analytical chemistry.

[31]  Rasmus Bro,et al.  Application of Multi-Way Analysis to 2D NMR Data , 2006 .

[32]  Age K Smilde,et al.  Multilevel data analysis of a crossover designed human nutritional intervention study. , 2008, Journal of proteome research.

[33]  Johan Trygg,et al.  Chemometrics in metabonomics. , 2007, Journal of proteome research.

[34]  G. Foca,et al.  Adulteration of the anthocyanin content of red wines: perspectives for authentication by Fourier transform-near infrared and 1H NMR spectroscopies. , 2011, Analytica chimica acta.

[35]  R. Tauler Multivariate curve resolution applied to second order data , 1995 .

[36]  Age K. Smilde,et al.  Direct orthogonal signal correction , 2001 .

[37]  D W Bennett,et al.  Quantification of plasma lipoproteins by proton nuclear magnetic resonance spectroscopy. , 1991, Clinical chemistry.

[38]  E. A. Sylvestre,et al.  Self Modeling Curve Resolution , 1971 .

[39]  D. Gauguier,et al.  Statistical total correlation spectroscopy: an exploratory approach for latent biomarker identification from metabolic 1H NMR data sets. , 2005, Analytical chemistry.

[40]  M. Zweig,et al.  Receiver-operating characteristic (ROC) plots: a fundamental evaluation tool in clinical medicine. , 1993, Clinical chemistry.

[41]  R. Manne On the resolution problem in hyphenated chromatography , 1995 .

[42]  John C Lindon,et al.  Robust algorithms for automated chemical shift calibration of 1D 1H NMR spectra of blood serum. , 2008, Analytical chemistry.

[43]  F Savorani,et al.  icoshift: A versatile tool for the rapid alignment of 1D NMR spectra. , 2010, Journal of magnetic resonance.

[44]  David S. Wishart,et al.  Quantitative metabolomics using NMR , 2008 .

[45]  E. K. Kemsley,et al.  Multivariate techniques and their application in nutrition: a metabolomics case study , 2007, British Journal of Nutrition.

[46]  J. Carstensen,et al.  Aligning of single and multiple wavelength chromatographic profiles for chemometric data analysis using correlation optimised warping , 1998 .

[47]  Svante Wold,et al.  Clustering of aryl carbon-13 nuclear magnetic resonance substituent chemical shifts. A multivariate data analysis using principal components , 1983 .

[48]  S. Wold,et al.  Partial least squares analysis with cross‐validation for the two‐class problem: A Monte Carlo study , 1987 .

[49]  A. Smilde,et al.  Nutrikinetics: concept, technologies, applications, perspectives , 2012 .

[50]  R. Fisher XV.—The Correlation between Relatives on the Supposition of Mendelian Inheritance. , 1919, Transactions of the Royal Society of Edinburgh.

[51]  Henri S. Tapp,et al.  Notes on the practical utility of OPLS , 2009 .