Weighted sparse principal component analysis

Abstract Sparse principal component analysis (SPCA) has been shown to be a fruitful method for the analysis of high-dimensional data. So far, however, no method has been proposed that allows to assign elementwise weights to the matrix of residuals, although this may have several useful applications. We propose a novel SPCA method that includes the flexibility to weight at the level of the elements of the data matrix. The superior performance of the weighted SPCA approach compared to unweighted SPCA is shown for data simulated according to the prevailing multiplicative-additive error model. In addition, applying weighted SPCA to genomewide transcription rates obtained soon after vaccination, resulted in a biologically meaningful selection of variables with components that are associated to the measured vaccine efficacy. The MATLAB implementation of the weighted sparse PCA method is freely available from https://github.com/katrijnvandeun/WSPCA .

[1]  Jianhua Z. Huang,et al.  Biclustering via Sparse Singular Value Decomposition , 2010, Biometrics.

[2]  S. Wold Exponentially weighted moving principal components analysis and projections to latent structures , 1994 .

[3]  H. Zou The Adaptive Lasso and Its Oracle Properties , 2006 .

[4]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[5]  Jianhua Z. Huang,et al.  Sparse principal component analysis via regularized low rank matrix approximation , 2008 .

[6]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[7]  R Bro,et al.  Cross-validation of component models: A critical look at current methods , 2008, Analytical and bioanalytical chemistry.

[8]  Rasmus Bro,et al.  A tutorial on the Lasso approach to sparse modeling , 2012 .

[9]  Sandra Romero-Steiner,et al.  Molecular signatures of antibody responses derived from a systems biological study of 5 human vaccines , 2013, Nature Immunology.

[10]  Jorge Cadima Departamento de Matematica Loading and correlations in the interpretation of principle compenents , 1995 .

[11]  Eva Ceulemans,et al.  Model selection in principal covariates regression , 2016 .

[12]  A. El-Aneed,et al.  Mass spectrometric based approaches in urine metabolomics and biomarker discovery. , 2017, Mass spectrometry reviews.

[13]  David M. Rocke,et al.  A Model for Measurement Error for Gene Expression Arrays , 2001, J. Comput. Biol..

[14]  Terence P. Speed,et al.  Quality Assessment for Short Oligonucleotide Microarray Data , 2007, Technometrics.

[15]  George Michailidis,et al.  Principal Component Analysis With Sparse Fused Loadings , 2010, Journal of computational and graphical statistics : a joint publication of American Statistical Association, Institute of Mathematical Statistics, Interface Foundation of North America.

[16]  H. Kiers Weighted least squares fitting using ordinary least squares algorithms , 1997 .

[17]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[18]  Martin Sill,et al.  Robust biclustering by sparse singular value decomposition incorporating stability selection , 2011, Bioinform..

[19]  S. Zamir,et al.  Lower Rank Approximation of Matrices by Least Squares With Any Choice of Weights , 1979 .

[20]  Genevera I. Allen,et al.  Journal of the American Statistical Association a Generalized Least-square Matrix Decomposition a Generalized Least-square Matrix Decomposition , 2022 .

[21]  U. Alon,et al.  Mutation Rules and the Evolution of Sparseness and Modularity in Biological Systems , 2013, PloS one.

[22]  Gene H. Golub,et al.  Matrix computations , 1983 .

[23]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[24]  Benson Mwangi,et al.  A Review of Feature Reduction Techniques in Neuroimaging , 2013, Neuroinformatics.

[25]  Age K Smilde,et al.  Fusing metabolomics data sets with heterogeneous measurement errors , 2018, PloS one.

[26]  T. Cai,et al.  Sparse PCA: Optimal rates and adaptive estimation , 2012, 1211.1309.

[27]  John B. Willett,et al.  Another Cautionary Note about R 2: Its Use in Weighted Least-Squares Regression Analysis , 1988 .

[28]  Martin Vingron,et al.  Variance stabilization applied to microarray data calibration and to the quantification of differential expression , 2002, ISMB.

[29]  Di Wu,et al.  Nondestructive Spectroscopic and Imaging Techniques for Quality Evaluation and Assessment of Fish and Fish Products , 2015, Critical reviews in food science and nutrition.

[30]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[31]  Eva K. Lee,et al.  Systems Biology of Seasonal Influenza Vaccination in Humans , 2011, Nature Immunology.

[32]  Margaret Werner-Washburne,et al.  BMC Bioinformatics BioMed Central Methodology article Multivariate curve resolution of time course microarray data , 2006 .

[33]  Henk A. L. Kiers,et al.  Setting up alternating least squares and iterative majorization algorithms for solving various matrix optimization problems , 2002, Comput. Stat. Data Anal..

[34]  Nickolay T. Trendafilov,et al.  Sparse principal component analysis subject to prespecified cardinality of loadings , 2016, Comput. Stat..

[35]  David M. Rocke,et al.  A Two-Component Model for Measurement Error in Analytical Chemistry , 1995 .

[36]  A. Smilde,et al.  New figures of merit for comprehensive functional genomics data: the metabolomics case. , 2011, Analytical chemistry.

[37]  Peter D. Wentzell,et al.  An introduction to DNA microarrays for gene expression analysis , 2010 .

[38]  D. Hunter,et al.  Optimization Transfer Using Surrogate Objective Functions , 2000 .

[39]  Eva Ceulemans,et al.  Obtaining insights from high-dimensional data: sparse principal covariates regression , 2018, BMC Bioinformatics.

[40]  J. Berge,et al.  Tucker's congruence coefficient as a meaningful index of factor similarity. , 2006 .

[41]  R. Tibshirani,et al.  PATHWISE COORDINATE OPTIMIZATION , 2007, 0708.1485.

[42]  R. Tibshirani,et al.  Regression shrinkage and selection via the lasso: a retrospective , 2011 .

[43]  I. Johnstone,et al.  On Consistency and Sparsity for Principal Components Analysis in High Dimensions , 2009, Journal of the American Statistical Association.

[44]  H. Kiers Towards a standardized notation and terminology in multiway analysis , 2000 .

[45]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[46]  Tom F. Wilderjans,et al.  A flexible framework for sparse simultaneous component based data integration , 2011, BMC Bioinformatics.

[47]  Genevera I. Allen,et al.  Sparse non-negative generalized PCA with applications to metabolomics , 2011, Bioinform..

[48]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[49]  Darren T. Andrews,et al.  Maximum likelihood principal component analysis , 1997 .

[50]  Ron Wehrens,et al.  The pls Package: Principal Component and Partial Least Squares Regression in R , 2007 .