PLS‐DA for compositional data with application to metabolomics

When quantifying information in metabolomics, the results are often expressed as data carrying only relative information. Vectors of these data have positive components, and the only relevant information is contained in the ratios between their parts; such observations are called compositional data. The aim of the paper is to demonstrate how partial least squares discriminant analysis (PLS‐DA)—the most widely used method in chemometrics for multivariate classification—can be applied to compositional data. Theoretical arguments are provided, and data sets from metabolomics are investigated. The data are related to the diagnosis of inherited metabolic disorders (IMDs). The first example analyzes the significance of the corresponding regression parameters (metabolites) using a small data set resulting from targeted metabolomics, where just a subset of potential markers is selected. The second example—the approach of untargeted metabolomics—was used for the analysis detecting almost 500 metabolites. The significance of the metabolites is investigated by applying PLS‐DA, accommodated according to a compositional approach. The significance of important metabolites (markers of diseases) is more clearly visible with the compositional method in both examples. Also, cross‐validation methods lead to better results in case of using the compositional approach. Copyright © 2014 John Wiley & Sons, Ltd.

[1]  J. Kreuder,et al.  Metabonomics of newborn screening dried blood spot samples: a novel approach in the screening and diagnostics of inborn errors of metabolism. , 2012, Analytical chemistry.

[2]  F. Hamers,et al.  Cost-effectiveness analysis of universal newborn screening for medium chain acyl-CoA dehydrogenase deficiency in France , 2012, BMC Pediatrics.

[3]  P. Filzmoser,et al.  Linear regression with compositional explanatory variables , 2012 .

[4]  P. Bruheim,et al.  Targeted metabolomic analysis of plasma samples for the diagnosis of inherited metabolic disorders. , 2012, Journal of chromatography. A.

[5]  Statistical analysis of wines using a robust compositional biplot. , 2012, Talanta.

[6]  S. Neumann,et al.  CAMERA: an integrated strategy for compound spectra extraction and annotation of liquid chromatography/mass spectrometry data sets. , 2012, Analytical chemistry.

[7]  V. Pawlowsky-Glahn,et al.  Compositional data analysis : theory and applications , 2011 .

[8]  Age K. Smilde,et al.  Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies , 2011, Metabolomics.

[9]  Joshua D. Knowles,et al.  Procedures for large-scale metabolic profiling of serum and plasma using gas chromatography and liquid chromatography coupled to mass spectrometry , 2011, Nature Protocols.

[10]  Anthony Randal McIntosh,et al.  Partial Least Squares (PLS) methods for neuroimaging: A tutorial and review , 2011, NeuroImage.

[11]  Peter Filzmoser,et al.  Imputation of missing values for compositional data using classical and robust methods , 2008 .

[12]  Timothy M. D. Ebbels,et al.  Correction of mass calibration gaps in liquid chromatography-mass spectrometry metabolomics data , 2010, Bioinform..

[13]  D. Matern,et al.  Newborn screening for disorders of fatty-acid oxidation: experience and recommendations from an expert meeting , 2010, Journal of Inherited Metabolic Disease.

[14]  Michele Gallo,et al.  Discriminant partial least squares analysis on compositional data , 2010 .

[15]  L. Müller,et al.  Coffee aroma--statistical analysis of compositional data. , 2009, Talanta.

[16]  J. Egozcue Reply to “On the Harker Variation Diagrams; …” by J.A. Cortés , 2009 .

[17]  P. Filzmoser,et al.  Principal component analysis for compositional data with outliers , 2009 .

[18]  J. Pongratz,et al.  Validation of MCADD newborn screening , 2009 .

[19]  Peter Filzmoser,et al.  Introduction to Multivariate Statistical Analysis in Chemometrics , 2009 .

[20]  D. Chace Mass spectrometry in newborn and metabolic screening: historical perspective and future directions. , 2009, Journal of mass spectrometry : JMS.

[21]  Steffen Neumann,et al.  Highly sensitive feature detection for high resolution LC/MS , 2008, BMC Bioinformatics.

[22]  P. Bross,et al.  Mitochondrial fatty acid oxidation defects—remaining challenges , 2008, Journal of Inherited Metabolic Disease.

[23]  R. Snell,et al.  Profiling the metabolic proteome of bovine mammary tissue , 2008, Proteomics.

[24]  W. Matson,et al.  Metabolomic profiling to develop blood biomarkers for Parkinson's disease. , 2008, Brain : a journal of neurology.

[25]  J. Leonard,et al.  Newborn screening for medium chain acyl CoA dehydrogenase deficiency , 2008, Archives of Disease in Childhood.

[26]  C. F. Wu JACKKNIFE , BOOTSTRAP AND OTHER RESAMPLING METHODS IN REGRESSION ANALYSIS ' BY , 2008 .

[27]  Johan Trygg,et al.  Chemometrics in metabonomics. , 2007, Journal of proteome research.

[28]  Ron Wehrens,et al.  The pls Package: Principal Component and Partial Least Squares Regression in R , 2007 .

[29]  Matej Oresic,et al.  Normalization method for metabolomics data using optimal selection of multiple internal standards , 2007, BMC Bioinformatics.

[30]  Wenyun Lu,et al.  Separation and quantitation of water soluble cellular metabolites by hydrophilic interaction chromatography-tandem mass spectrometry. , 2006, Journal of chromatography. A.

[31]  Age K. Smilde,et al.  Assessing the performance of statistical validation tools for megavariate metabolomics data , 2006, Metabolomics.

[32]  R. Abagyan,et al.  XCMS: processing mass spectrometry data for metabolite profiling using nonlinear peak alignment, matching, and identification. , 2006, Analytical chemistry.

[33]  A. Smilde,et al.  Large-scale human metabolomics studies: a strategy for data (pre-) processing and validation. , 2006, Analytical chemistry.

[34]  V. Pawlowsky-Glahn,et al.  Groups of Parts and Their Balances in Compositional Data Analysis , 2005 .

[35]  Roman Rosipal,et al.  Overview and Recent Advances in Partial Least Squares , 2005, SLSFS.

[36]  Michel Tenenhaus,et al.  PLS generalised linear regression , 2005, Comput. Stat. Data Anal..

[37]  A. Fernie,et al.  Metabolite profiling: from diagnostics to systems biology , 2004, Nature Reviews Molecular Cell Biology.

[38]  Mariusz Kowalczyk,et al.  A strategy for identifying differences in large series of metabolomic samples analyzed by GC/MS. , 2004, Analytical chemistry.

[39]  Bernhard Liebl,et al.  Data required for the evaluation of newborn screening programmes , 2003, European Journal of Pediatrics.

[40]  G. Mateu-Figueras,et al.  Isometric Logratio Transformations for Compositional Data Analysis , 2003 .

[41]  M. Tenenhaus,et al.  Prediction of clinical outcome with microarray data: a partial least squares discriminant analysis (PLS-DA) approach , 2003, Human Genetics.

[42]  R. Wanders,et al.  Molecular and functional characterisation of mild MCAD deficiency , 2001, Human Genetics.

[43]  H. Martens,et al.  Modified Jack-knife estimation of parameter uncertainty in bilinear modelling by partial least squares regression (PLSR) , 2000 .

[44]  学 加納,et al.  Partial Least Squares Regression を用いた蒸留塔製品組成の推定制御 , 1998 .

[45]  Sabine Van Huffel,et al.  Recent advances in total least squares techniques and errors-in-variables modeling , 1997 .

[46]  M. D. Partis,et al.  Using fragment lengths from incomplete digestion by multiply cleaving enzymes to map antibody binding sites on a protein , 1992, Comput. Appl. Biosci..

[47]  E. V. Thomas,et al.  Partial least-squares methods for spectral analyses. 1. Relation to other quantitative calibration methods and the extraction of qualitative information , 1988 .

[48]  Changbao Wu,et al.  Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis , 1986 .

[49]  K. Singh,et al.  Discussion: Jackknife, Bootstrap and Other Resampling Methods in Regression Analysis , 1986 .

[50]  M. L. Eaton Multivariate statistics : a vector space approach , 1985 .

[51]  John Aitchison,et al.  The Statistical Analysis of Compositional Data , 1986 .

[52]  W. Cleveland Robust Locally Weighted Regression and Smoothing Scatterplots , 1979 .

[53]  Wei-Hao Wang,et al.  Studies , 1926 .