Regularized partial least squares with an application to NMR spectroscopy

High-dimensional data common in genomics, proteomics, and chemometrics often contains complicated correlation structures. Recently, partial least squares (PLS) and Sparse PLS methods have gained attention in these areas as dimension reduction techniques in the context of supervised data analysis. We introduce a framework for Regularized PLS by solving a relaxation of the SIMPLS optimization problem with penalties on the PLS loadings vectors. Our approach enjoys many advantages including flexibility, general penalties, easy interpretation of results, and fast computation in high-dimensional settings. We also outline extensions of our methods leading to novel methods for non-negative PLS and generalized PLS, an adoption of PLS for structured data. We demonstrate the utility of our methods through simulations and a case study on proton Nuclear Magnetic Resonance (NMR) spectroscopy data.

[1]  M. Barker,et al.  Partial least squares for discrimination , 2003 .

[2]  W. Dunn,et al.  Measuring the metabolome: current analytical technologies. , 2005, The Analyst.

[3]  Patrik O. Hoyer,et al.  Non-negative Matrix Factorization with Sparseness Constraints , 2004, J. Mach. Learn. Res..

[4]  Patrick O. Perry,et al.  Bi-cross-validation of the SVD and the nonnegative matrix factorization , 2009, 0908.2062.

[5]  Jianhua Z. Huang,et al.  Biclustering via Sparse Singular Value Decomposition , 2010, Biometrics.

[6]  Nicole Krämer,et al.  An overview on the shrinkage properties of partial least squares regression , 2007, Comput. Stat..

[7]  Mirjana Maletic-Savatic,et al.  Comment on "Magnetic Resonance Spectroscopy Identifies Neural Progenitor Cells in the Live Human Brain" , 2008, Science.

[8]  B. Nadler,et al.  The prediction error in CLS and PLS: the importance of feature selection prior to multivariate calibration , 2005 .

[9]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[10]  P. Reiss,et al.  Functional Principal Component Regression and Functional Partial Least Squares , 2007 .

[11]  Philippe Besse,et al.  Statistical Applications in Genetics and Molecular Biology A Sparse PLS for Variable Selection when Integrating Omics Data , 2011 .

[12]  Sunduz Keles,et al.  Sparse Partial Least Squares Classification for High Dimensional Data , 2010, Statistical applications in genetics and molecular biology.

[13]  Anne-Laure Boulesteix,et al.  Partial least squares: a versatile tool for the analysis of high-dimensional genomic data , 2006, Briefings Bioinform..

[14]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[15]  Jianhua Z. Huang,et al.  Sparse principal component analysis via regularized low rank matrix approximation , 2008 .

[16]  Tom Fearn,et al.  Partial Least Squares Regression on Smooth Factors , 1996 .

[17]  Danh V. Nguyen,et al.  Tumor classification by partial least squares using microarray gene expression data , 2002, Bioinform..

[18]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[19]  J. Lindon,et al.  Systems biology: Metabonomics , 2008, Nature.

[20]  Genevera I. Allen,et al.  A Generalized Least Squares Matrix Decomposition , 2011, 1102.3074.

[21]  D. Kell,et al.  Metabolomics by numbers: acquiring and understanding global metabolite data. , 2004, Trends in biotechnology.

[22]  Genevera I. Allen,et al.  Sparse non-negative generalized PCA with applications to metabolomics , 2011, Bioinform..

[23]  Bin Yu,et al.  Minimax rates of convergence for high-dimensional regression under ℓq-ball sparsity , 2009, 2009 47th Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[24]  Wei Pan,et al.  Modeling the relationship between LVAD support time and gene expression changes in the human heart by penalized partial least squares , 2004, Bioinform..

[25]  Dongjun Chung,et al.  eQTL Mapping for Functional Classes of Saccharomyces cerevisiae Genes with Multivariate Sparse Partial Least Squares Regression , 2011, Handbook of Statistical Bioinformatics.

[26]  Masashi Sugiyama,et al.  The Degrees of Freedom of Partial Least Squares Regression , 2010, 1002.4112.

[27]  Philippe Besse,et al.  Sparse PLS discriminant analysis: biologically relevant feature selection and graphical displays for multiclass problems , 2011, BMC Bioinformatics.

[28]  Danh V. Nguyen,et al.  Partial least squares proportional hazard regression for application to DNA microarray survival data , 2002, Bioinform..

[29]  R. Tibshirani,et al.  Prediction by Supervised Principal Components , 2006 .

[30]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[31]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[32]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[33]  S. Keleş,et al.  Sparse partial least squares regression for simultaneous dimension reduction and variable selection , 2010, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[34]  J. Marron,et al.  PCA CONSISTENCY IN HIGH DIMENSION, LOW SAMPLE SIZE CONTEXT , 2009, 0911.3827.

[35]  M. Wainwright,et al.  High-dimensional analysis of semidefinite relaxations for sparse principal components , 2008, 2008 IEEE International Symposium on Information Theory.

[36]  R. Tibshirani,et al.  Sparsity and smoothness via the fused lasso , 2005 .

[37]  B. Marx Iteratively reweighted partial least squares estimation for generalized linear regression , 1996 .

[38]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[39]  Dan Shen,et al.  Sparse PCA Asymptotics and Analysis of Tree Data , 2012 .

[40]  S. D. Jong SIMPLS: an alternative approach to partial least squares regression , 1993 .

[41]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .