Application of sparse linear discriminant analysis for metabolomics data

The discovery of potential biomarkers that may be closely related to diseases is a major purpose of metabolomics data analysis. Hence, we expect to find some effective methods which can screen these potential biomarkers from large amounts of dataset. In this paper, we propose an effective strategy named sparse linear discriminant analysis (SLDA), which can perform classification and variable selection simultaneously to analyze complicated metabolomics datasets. Compared with two other approaches, i.e. partial least squares discriminant analysis (PLS-DA) and competitive adaptive reweighted sampling (CARS), SLDA obtains relatively better results and can identify some informative metabolites, which are proven to be consistent with those identified by biochemical studies. Furthermore, by building a model based on selected features, SLDA can be applied to high dimensional, small sample cases where linear discriminant analysis (LDA) fails to work. In summary, SLDA is a very useful method for exploring and processing metabolomics data.

[1]  Trevor J. Hastie,et al.  Sparse Discriminant Analysis , 2011, Technometrics.

[2]  K. Siamopoulos,et al.  Evaluation of tubulointerstitial lesions' severity in patients with glomerulonephritides: an NMR-based metabonomic study. , 2007, Journal of proteome research.

[3]  Coral Barbas,et al.  Gas chromatography-mass spectrometry (GC-MS)-based metabolomics. , 2011, Methods in molecular biology.

[4]  David S. Wishart,et al.  Quantitative metabolomics using NMR , 2008 .

[5]  Simon C Watkins,et al.  Skeletal muscle lipid content and oxidative enzyme activity in relation to muscle fiber type in type 2 diabetes and obesity. , 2001, Diabetes.

[6]  Isobel Claire Gormley,et al.  Probabilistic principal component analysis for metabolomic data , 2010, BMC Bioinformatics.

[7]  P K Stumpf,et al.  Metabolism of fatty acids. , 1969, Annual review of biochemistry.

[8]  A. Saghatelian,et al.  Assignment of endogenous substrates to enzymes by global metabolite profiling. , 2004, Biochemistry.

[9]  Joachim Thiery,et al.  LC-MS-based metabolomics in the clinical laboratory. , 2012, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[10]  Dongsheng Cao,et al.  Plasma metabolic fingerprinting of childhood obesity by GC/MS in conjunction with multivariate statistical analysis. , 2010, Journal of pharmaceutical and biomedical analysis.

[11]  D. Kell,et al.  Metabolomics by numbers: acquiring and understanding global metabolite data. , 2004, Trends in biotechnology.

[12]  Fang Yu,et al.  Palmitic acid mediates hypothalamic insulin resistance by altering PKC-theta subcellular localization in rodents. , 2009, The Journal of clinical investigation.

[13]  R. Tibshirani,et al.  Penalized Discriminant Analysis , 1995 .

[14]  Qing-Song Xu,et al.  Selective of informative metabolites using random forests based on model population analysis. , 2013, Talanta.

[15]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[16]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[17]  H. Zou,et al.  Regularization and variable selection via the elastic net , 2005 .

[18]  Qing-Song Xu,et al.  Uncorrelated linear discriminant analysis (ULDA): A powerful tool for exploration of metabolomics data , 2008 .

[19]  H. Ressom,et al.  LC-MS-based metabolomics. , 2012, Molecular bioSystems.

[20]  J. Lindon,et al.  Metabonomics: a platform for studying drug toxicity and gene function , 2002, Nature Reviews Drug Discovery.

[21]  M. Tenenhaus,et al.  Prediction of clinical outcome with microarray data: a partial least squares discriminant analysis (PLS-DA) approach , 2003, Human Genetics.

[22]  Dong-Sheng Cao,et al.  Recipe for revealing informative metabolites based on model population analysis , 2010, Metabolomics.

[23]  Age K. Smilde,et al.  Double-check: validation of diagnostic statistics for PLS-DA models in metabolomics studies , 2011, Metabolomics.

[24]  Z. Ramadan,et al.  Metabolic profiling using principal component analysis, discriminant partial least squares, and genetic algorithms. , 2006, Talanta.

[25]  J. Lindon,et al.  'Metabonomics': understanding the metabolic responses of living systems to pathophysiological stimuli via multivariate statistical analysis of biological NMR spectroscopic data. , 1999, Xenobiotica; the fate of foreign compounds in biological systems.

[26]  Konstantinos N. Plataniotis,et al.  Regularization studies of linear discriminant analysis in small sample size scenarios with application to face recognition , 2005, Pattern Recognit. Lett..

[27]  Hongdong Li,et al.  Key wavelengths screening using competitive adaptive reweighted sampling method for multivariate calibration. , 2009, Analytica chimica acta.

[28]  J. Shao,et al.  Sparse linear discriminant analysis by thresholding for high dimensional data , 2011, 1105.3561.

[29]  M. Klapa,et al.  Standardizing GC-MS metabolomics. , 2008, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[30]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[31]  Qing Li,et al.  The Bayesian elastic net , 2010 .

[32]  Yi-Zeng Liang,et al.  Monte Carlo cross validation , 2001 .