Stochastic convex sparse principal component analysis

Principal component analysis (PCA) is a dimensionality reduction and data analysis tool commonly used in many areas. The main idea of PCA is to represent high-dimensional data with a few representative components that capture most of the variance present in the data. However, there is an obvious disadvantage of traditional PCA when it is applied to analyze data where interpretability is important. In applications, where the features have some physical meanings, we lose the ability to interpret the principal components extracted by conventional PCA because each principal component is a linear combination of all the original features. For this reason, sparse PCA has been proposed to improve the interpretability of traditional PCA by introducing sparsity to the loading vectors of principal components. The sparse PCA can be formulated as an ℓ1 regularized optimization problem, which can be solved by proximal gradient methods. However, these methods do not scale well because computation of the exact gradient is generally required at each iteration. Stochastic gradient framework addresses this challenge by computing an expected gradient at each iteration. Nevertheless, stochastic approaches typically have low convergence rates due to the high variance. In this paper, we propose a convex sparse principal component analysis (Cvx-SPCA), which leverages a proximal variance reduced stochastic scheme to achieve a geometric convergence rate. We further show that the convergence analysis can be significantly simplified by using a weak condition which allows a broader class of objectives to be applied. The efficiency and effectiveness of the proposed method are demonstrated on a large-scale electronic medical record cohort.

[1]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[2]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[3]  Atsushi Nitanda,et al.  Stochastic Proximal Gradient Descent with Acceleration Techniques , 2014, NIPS.

[4]  Dickson L Omucheni,et al.  Application of principal component analysis to multispectral-multimodal optical image analysis for malaria diagnostics , 2014, Malaria Journal.

[5]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[6]  Manal Abdullah,et al.  Optimizing Face Recognition Using PCA , 2012, ArXiv.

[7]  Matthias Hein,et al.  An Inverse Power Method for Nonlinear Eigenproblems with Applications in 1-Spectral Clustering and Sparse PCA , 2010, NIPS.

[8]  Stephen J. Wright,et al.  Asynchronous Stochastic Coordinate Descent: Parallelism and Convergence Properties , 2014, SIAM J. Optim..

[9]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[10]  PeterChris Okpala,et al.  The Electronic Medical Record (EMR) , 2013 .

[11]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[12]  E. Vishnu Balan,et al.  Preservation of privacy in data mining by using PCA based perturbation technique , 2015, 2015 International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM).

[13]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[14]  Tong Zhang,et al.  Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[15]  Michael J. Black,et al.  Robust Principal Component Analysis for Computer Vision , 2001, ICCV.

[16]  Klaus-Robert Müller,et al.  Efficient and Accurate Lp-Norm Multiple Kernel Learning , 2009, NIPS.

[17]  Wen-Yu Wang,et al.  Application and Research of Data Mining Based on Improved PCA Method , 2009, 2009 Second International Symposium on Information Science and Engineering.

[18]  Jieping Ye,et al.  An accelerated gradient method for trace norm minimization , 2009, ICML '09.

[19]  Stephen P. Boyd,et al.  Subgradient Methods , 2007 .

[20]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[21]  Michael J. Black,et al.  Robust principal component analysis for computer vision , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[22]  Stephen J. Wright,et al.  Sparse reconstruction by separable approximation , 2009, IEEE Trans. Signal Process..

[23]  Elad Hazan,et al.  Fast and Simple PCA via Convex Optimization , 2015, ArXiv.

[24]  Naum Zuselevich Shor,et al.  Minimization Methods for Non-Differentiable Functions , 1985, Springer Series in Computational Mathematics.

[25]  Zhaoran Wang,et al.  Sparse PCA with Oracle Property , 2014, NIPS.

[26]  Jiayu Zhou,et al.  A multi-task learning formulation for predicting disease progression , 2011, KDD.

[27]  Allen Y. Yang,et al.  Informative feature selection for object recognition via Sparse PCA , 2011, 2011 International Conference on Computer Vision.

[28]  Giovanni Pioggia,et al.  Backpropagation-Based Non Linear PCA for Biomedical Applications , 2009, 2009 Ninth International Conference on Intelligent Systems Design and Applications.

[29]  Ohad Shamir,et al.  A Stochastic PCA and SVD Algorithm with an Exponential Convergence Rate , 2014, ICML.

[30]  Jieping Ye,et al.  Large-scale sparse logistic regression , 2009, KDD.

[31]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .