Sparse non-negative generalized PCA with applications to metabolomics

MOTIVATION Nuclear magnetic resonance (NMR) spectroscopy has been used to study mixtures of metabolites in biological samples. This technology produces a spectrum for each sample depicting the chemical shifts at which an unknown number of latent metabolites resonate. The interpretation of this data with common multivariate exploratory methods such as principal components analysis (PCA) is limited due to high-dimensionality, non-negativity of the underlying spectra and dependencies at adjacent chemical shifts. RESULTS We develop a novel modification of PCA that is appropriate for analysis of NMR data, entitled Sparse Non-Negative Generalized PCA. This method yields interpretable principal components and loading vectors that select important features and directly account for both the non-negativity of the underlying spectra and dependencies at adjacent chemical shifts. Through the reanalysis of experimental NMR data on five purified neural cell types, we demonstrate the utility of our methods for dimension reduction, pattern recognition, sample exploration and feature selection. Our methods lead to the identification of novel metabolites that reflect the differences between these cell types. AVAILABILITY www.stat.rice.edu/~gallen/software.html. CONTACT gallen@rice.edu. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

[1]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[2]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[3]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[4]  R. Tibshirani,et al.  The solution path of the generalized lasso , 2010, 1005.1971.

[5]  J. Lindon,et al.  NMR‐based metabonomic approaches for evaluating physiological influences on biofluid composition , 2005, NMR in biomedicine.

[6]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[7]  P. Djurić,et al.  Metabolomics of neural progenitor cells: a novel approach to biomarker discovery. , 2008, Cold Spring Harbor symposia on quantitative biology.

[8]  E Holmes,et al.  Curve-fitting method for direct quantitation of compounds in complex biological mixtures using 1H NMR: application in metabonomic toxicology studies. , 2005, Analytical chemistry.

[9]  W. Weckwerth,et al.  Metabolomics: from pattern recognition to biological interpretation. , 2005, Drug discovery today.

[10]  Timothy M. D. Ebbels,et al.  Bioinformatic methods in NMR-based metabolic profiling , 2009 .

[11]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[12]  Mirjana Maletic-Savatic,et al.  Comment on "Magnetic Resonance Spectroscopy Identifies Neural Progenitor Cells in the Live Human Brain" , 2008, Science.

[13]  W. Dunn,et al.  Measuring the metabolome: current analytical technologies. , 2005, The Analyst.

[14]  I. Johnstone,et al.  Sparse Principal Components Analysis , 2009, 0901.4392.

[15]  Amnon Shashua,et al.  Nonnegative Sparse PCA , 2006, NIPS.

[16]  Hyunsoo Kim,et al.  Sparse Non-negative Matrix Factorizations via Alternating Non-negativity-constrained Least Squares , 2006 .

[17]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[18]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[19]  Jianhua Z. Huang,et al.  Sparse principal component analysis via regularized low rank matrix approximation , 2008 .

[20]  I. Jolliffe,et al.  A Modified Principal Component Technique Based on the LASSO , 2003 .

[21]  Genevera I. Allen,et al.  A Generalized Least Squares Matrix Decomposition , 2011, 1102.3074.

[22]  Lucas C. Parra,et al.  Nonnegative matrix factorization for rapid recovery of constituent spectra in magnetic resonance chemical shift imaging of the brain , 2004, IEEE Transactions on Medical Imaging.

[23]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[24]  Erin E. Carlson,et al.  Targeted profiling: quantitative analysis of 1H NMR metabolomics data. , 2006, Analytical chemistry.

[25]  Bernhard Y. Renard,et al.  NITPICK: peak identification for mass spectrometry data , 2008, BMC Bioinformatics.

[26]  Jianhua Z. Huang,et al.  Biclustering via Sparse Singular Value Decomposition , 2010, Biometrics.

[27]  J. Lindon,et al.  NMR-based metabolic profiling and metabonomic approaches to problems in molecular toxicology. , 2008, Chemical research in toxicology.

[28]  Veena S. Kasture,et al.  International Journal of Research and Development in Pharmacy and Life Sciences Metabolomics: Current Technologies and Future Trends , 2022 .

[29]  J. Lindon,et al.  Systems biology: Metabonomics , 2008, Nature.

[30]  D. Kell,et al.  Metabolomics by numbers: acquiring and understanding global metabolite data. , 2004, Trends in biotechnology.

[31]  Miron Livny,et al.  BioMagResBank , 2007, Nucleic Acids Res..

[32]  Cheng Zheng,et al.  Identification and quantification of metabolites in 1H NMR spectra by Bayesian model selection , 2011, Bioinform..

[33]  John C Lindon,et al.  Processing and modeling of nuclear magnetic resonance (NMR) metabolic profiles. , 2011, Methods in molecular biology.

[34]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[35]  Ian D. Wilson,et al.  Metabolic Phenotyping in Health and Disease , 2008, Cell.