Classification of samples from NMR-based metabolomics using principal components analysis and partial least squares with uncertainty estimation

AbstractRecent progress in metabolomics has been aided by the development of analysis techniques such as gas and liquid chromatography coupled with mass spectrometry (GC-MS and LC-MS) and nuclear magnetic resonance (NMR) spectroscopy. The vast quantities of data produced by these techniques has resulted in an increase in the use of machine algorithms that can aid in the interpretation of this data, such as principal components analysis (PCA) and partial least squares (PLS). Techniques such as these can be applied to biomarker discovery, interlaboratory comparison, and clinical diagnoses. However, there is a lingering question whether the results of these studies can be applied to broader sets of clinical data, usually taken from different data sources. In this work, we address this question by creating a metabolomics workflow that combines a previously published consensus analysis procedure (https://doi.org/10.1016/j.chemolab.2016.12.010) with PCA and PLS models using uncertainty analysis based on bootstrapping. This workflow is applied to NMR data that come from an interlaboratory comparison study using synthetic and biologically obtained metabolite mixtures. The consensus analysis identifies trusted laboratories, whose data are used to create classification models that are more reliable than without. With uncertainty analysis, the reliability of the classification can be rigorously quantified, both for data from the original set and from new data that the model is analyzing. Graphical abstractᅟ

[1]  N. M. Faber,et al.  Sample-specific standard error of prediction for partial least squares regression , 2003 .

[2]  Masahito Hosokawa,et al.  In vivo live cell imaging for the quantitative monitoring of lipids by using Raman microspectroscopy. , 2014, Analytical chemistry.

[3]  A. J. Morris,et al.  Confidence limits for contribution plots , 2000 .

[4]  Xin Lu,et al.  LC-MS-based metabonomics analysis. , 2008, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[5]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[6]  Efstathios Paparoditis,et al.  Bootstrap methods for dependent data: A review , 2011 .

[7]  David S. Wishart,et al.  Quantitative metabolomics using NMR , 2008 .

[8]  S. Wold,et al.  PLS-regression: a basic tool of chemometrics , 2001 .

[9]  Timothy M. D. Ebbels,et al.  The evolution of partial least squares models and related chemometric approaches in metabonomics and metabolic phenotyping , 2010 .

[10]  Yuan Zhang,et al.  Metabolic changes in paraquat poisoned patients and support vector machine model of discrimination. , 2015, Biological & pharmaceutical bulletin.

[11]  Richard D. Beger,et al.  Quality assurance and quality control processes: summary of a metabolomics community questionnaire , 2017, Metabolomics.

[12]  Stephen L. R. Ellison,et al.  Dark uncertainty , 2011 .

[13]  J. Ghosh,et al.  Bootstrap—An exploration , 2014 .

[14]  Ricard Boqué,et al.  Multi-class classification with probabilistic discriminant partial least squares (p-DPLS). , 2010, Analytica chimica acta.

[15]  Åsmund Rinnan,et al.  Bootstrap based confidence limits in principal component analysis: a case study , 2013 .

[16]  Hemanth Noothalapati,et al.  Exploring metabolic pathways in vivo by a combined approach of mixed stable isotope-labeled Raman microspectroscopy and multivariate curve resolution analysis. , 2014, Analytical chemistry.

[17]  J. S. Urban Hjorth,et al.  Computer Intensive Statistical Methods: Validation, Model Selection, and Bootstrap , 1993 .

[18]  João A. Lopes,et al.  Uncertainty assessment in FT-IR spectroscopy based bacteria classification models , 2008 .

[19]  B. Hammock,et al.  Mass spectrometry-based metabolomics. , 2007, Mass spectrometry reviews.

[20]  Masanori Arita,et al.  GC/MS based metabolomics: development of a data mining system for metabolite identification by using soft independent modeling of class analogy (SIMCA) , 2011, BMC Bioinformatics.

[21]  Robert Tibshirani,et al.  An Introduction to the Bootstrap , 1994 .

[22]  Miguel Rocha,et al.  Metabolomics combined with chemometric tools (PCA, HCA, PLS-DA and SVM) for screening cassava (Manihot esculenta Crantz) roots during postharvest physiological deterioration. , 2014, Food chemistry.

[23]  Ahsan Kareem,et al.  On the reliability of a class of system identification techniques: insights from bootstrap theory , 2002 .

[24]  D. Sheen,et al.  Classification of biodegradable materials using QSAR modelling with uncertainty estimation§ , 2016, SAR and QSAR in environmental research.

[25]  Hein Putter,et al.  The bootstrap: a tutorial , 2000 .

[26]  Mark R Viant,et al.  International NMR-based environmental metabolomics intercomparison exercise. , 2009, Environmental science & technology.

[27]  Corey D. DeHaven,et al.  Integrated, nontargeted ultrahigh performance liquid chromatography/electrospray ionization tandem mass spectrometry platform for the identification and relative quantification of the small-molecule complement of biological systems. , 2009, Analytical chemistry.

[28]  R. Boqué,et al.  Classification from microarray data using probabilistic discriminant partial least squares with reject option. , 2009, Talanta.

[29]  A. I. Ostermann,et al.  Targeted metabolomics of the arachidonic acid cascade: current state and challenges of LC–MS analysis of oxylipins , 2015, Analytical and Bioanalytical Chemistry.

[30]  Rainer Spang,et al.  Estimating classification probabilities in high-dimensional diagnostic studies , 2011, Bioinform..

[31]  R. Boqué,et al.  Calculation of the reliability of classification in discriminant partial least-squares binary classification , 2009 .

[32]  Peter B Harrington,et al.  Bootstrap classification and point-based feature selection from age-staged mouse cerebellum tissues of matrix assisted laser desorption/ionization mass spectra using a fuzzy rule-building expert system. , 2007, Analytica chimica acta.

[33]  W. Rocha,et al.  Exploratory analysis of biodiesel/diesel blends by Kohonen neural networks and infrared spectroscopy , 2015 .

[34]  Aurélien Mazurie,et al.  Application of support vector machines to metabolomics experiments with limited replicates , 2014, Metabolomics.

[35]  Hilko van der Voet,et al.  Pseudo-degrees of freedom for complex predictive models: the example of partial least squares , 1999 .

[36]  Yan-Ping Zhou,et al.  Particle swarm optimization-based protocol for partial least-squares discriminant analysis: Application to 1H nuclear magnetic resonance analysis of lung cancer metabonomics , 2014 .

[37]  Bruce R. Kowalski,et al.  PREDICTION ERROR IN LEAST SQUARES REGRESSION : FURTHER CRITIQUE ON THE DEVIATION USED IN THE UNSCRAMBLER , 1996 .

[38]  J. L. Fasching,et al.  Improving the Reliability of Factor Analysis of Chemical Data by Utilizing the Measured Analytical Uncertainty. , 1976 .

[39]  David A Sheen,et al.  A scoring metric for multivariate data for reproducibility analysis using chemometric methods. , 2017, Chemometrics and intelligent laboratory systems : an international journal sponsored by the Chemometrics Society.

[40]  Frans van den Berg,et al.  Comparison of bootstrap and asymptotic confidence limits for control charts in batch MSPC strategies , 2013 .

[41]  M. Stefanini,et al.  Analysis of the phenolic composition of fungus-resistant grape varieties cultivated in Italy and Germany using UHPLC-MS/MS. , 2014, Journal of mass spectrometry : JMS.

[42]  L. Tenori,et al.  Performance Assessment in Fingerprinting and Multi Component Quantitative NMR Analyses. , 2015, Analytical Chemistry.

[43]  Ronei J. Poppi,et al.  Discrimination between authentic and counterfeit banknotes using Raman spectroscopy and PLS-DA with uncertainty estimation , 2013 .

[44]  Desire L. Massart,et al.  Estimation of partial least squares regression prediction uncertainty when the reference values carry a sizeable measurement error , 2003 .

[45]  S. D. Jong,et al.  Handbook of Chemometrics and Qualimetrics , 1998 .

[46]  Davy Guillarme,et al.  Coupling ultra high-pressure liquid chromatography with mass spectrometry: constraints and possible applications. , 2013, Journal of chromatography. A.

[47]  Peter D. Wentzell,et al.  The Errors of My Ways: Maximum Likelihood PCA Seventeen Years after Bruce , 2015 .

[48]  P. Wentzell,et al.  Characterization of the measurement error structure in 1D 1H NMR data for metabolomics studies. , 2009, Analytica chimica acta.

[49]  R. Poppi,et al.  Classification of Amazonian rosewood essential oil by Raman spectroscopy and PLS-DA with reliability estimation. , 2013, Talanta.

[50]  Yi-Zeng Liang,et al.  Application of sparse linear discriminant analysis for metabolomics data , 2014 .

[51]  P. Harrington,et al.  Screening GC-MS data for carbamate pesticides with temperature-constrained–cascade correlation neural networks , 2000 .

[52]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[53]  N. M. Faber,et al.  Uncertainty estimation and figures of merit for multivariate calibration (IUPAC Technical Report) , 2006 .

[54]  D. Weston Ambient ionization mass spectrometry: current understanding of mechanistic theory; analytical performance and application areas. , 2010, The Analyst.

[55]  David I. Ellis,et al.  A comparative investigation of modern feature selection and classification approaches for the analysis of mass spectrometry data. , 2014, Analytica chimica acta.

[56]  Anthony C. Davison,et al.  Bootstrap Methods and Their Application , 1998 .

[57]  Pieter C Dorrestein,et al.  Real-time metabolomics on living microorganisms using ambient electrospray ionization flow-probe. , 2013, Analytical chemistry.

[58]  Kambiz Gilany,et al.  Metabolomics fingerprinting of the human seminal plasma of asthenozoospermic patients , 2014, Molecular reproduction and development.

[59]  Morten Aleksandr Engel Multiple objective resource allocation in product and process development , 1999 .

[60]  H. Martens,et al.  Modified Jack-knife estimation of parameter uncertainty in bilinear modelling by partial least squares regression (PLSR) , 2000 .

[61]  Y. Feng,et al.  Search for Potential Biomarkers by UPLC/Q-TOF–MS Analysis of Dynamic Changes of Glycerophospholipid Constituents of RAW264.7 Cells Treated With NSAID , 2015, Chromatographia.

[62]  P. Dorrestein,et al.  Data-Independent Microbial Metabolomics with Ambient Ionization Mass Spectrometry , 2013, Journal of The American Society for Mass Spectrometry.

[63]  Achim Kohler,et al.  Sparse multi-block PLSR for biomarker discovery when integrating data from LC–MS and NMR metabolomics , 2014, Metabolomics.

[64]  Mark R Viant,et al.  An NMR metabolomic investigation of early metabolic disturbances following traumatic brain injury in a mammalian model , 2005, NMR in biomedicine.

[65]  Peter de Boves Harrington,et al.  Multiple Versus Single Set Validation of Multivariate Models to Avoid Mistakes , 2018, Critical reviews in analytical chemistry.

[66]  I. Wilson,et al.  Understanding 'Global' Systems Biology: Metabonomics and the Continuum of Metabolism , 2003, Nature Reviews Drug Discovery.

[67]  Bruce R. Kowalski,et al.  Propagation of measurement errors for the validation of predictions obtained by principal component regression and partial least squares , 1997 .

[68]  J. Wist,et al.  Coffee's country of origin determined by NMR: the Colombian case. , 2015, Food chemistry.