An augmented Lagrangian approach for sparse principal component analysis

Principal component analysis (PCA) is a widely used technique for data analysis and dimension reduction with numerous applications in science and engineering. However, the standard PCA suffers from the fact that the principal components (PCs) are usually linear combinations of all the original variables, and it is thus often difficult to interpret the PCs. To alleviate this drawback, various sparse PCA approaches were proposed in the literature (Cadima and Jolliffe in J Appl Stat 22:203–214, 1995; d’Aspremont et al. in J Mach Learn Res 9:1269–1294, 2008; d’Aspremont et al. SIAM Rev 49:434–448, 2007; Jolliffe in J Appl Stat 22:29–35, 1995; Journée et al. in J Mach Learn Res 11:517–553, 2010; Jolliffe et al. in J Comput Graph Stat 12:531–547, 2003; Moghaddam et al. in Advances in neural information processing systems 18:915–922, MIT Press, Cambridge, 2006; Shen and Huang in J Multivar Anal 99(6):1015–1034, 2008; Zou et al. in J Comput Graph Stat 15(2):265–286, 2006). Despite success in achieving sparsity, some important properties enjoyed by the standard PCA are lost in these methods such as uncorrelation of PCs and orthogonality of loading vectors. Also, the total explained variance that they attempt to maximize can be too optimistic. In this paper we propose a new formulation for sparse PCA, aiming at finding sparse and nearly uncorrelated PCs with orthogonal loading vectors while explaining as much of the total variance as possible. We also develop a novel augmented Lagrangian method for solving a class of nonsmooth constrained optimization problems, which is well suited for our formulation of sparse PCA. We show that it converges to a feasible point, and moreover under some regularity assumptions, it converges to a stationary point. Additionally, we propose two nonmonotone gradient methods for solving the augmented Lagrangian subproblems, and establish their global and local convergence. Finally, we compare our sparse PCA approach with several existing methods on synthetic (Zou et al. in J Comput Graph Stat 15(2):265–286, 2006), Pitprops (Jeffers in Appl Stat 16:225–236, 1967), and gene expression data (Chin et al in Cancer Cell 10:529C–541C, 2006), respectively. The computational results demonstrate that the sparse PCs produced by our approach substantially outperform those by other methods in terms of total explained variance, correlation of PCs, and orthogonality of loading vectors. Moreover, the experiments on random data show that our method is capable of solving large-scale problems within a reasonable amount of time.

[1]  K. Fan On a Theorem of Weyl Concerning Eigenvalues of Linear Transformations I. , 1949, Proceedings of the National Academy of Sciences of the United States of America.

[2]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[3]  Renato D. C. Monteiro,et al.  A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization , 2003, Math. Program..

[4]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[5]  Alexandre d'Aspremont,et al.  Optimal Solutions for Sparse Principal Component Analysis , 2007, J. Mach. Learn. Res..

[6]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[7]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[8]  U. Helmke,et al.  Optimization and Dynamical Systems , 1994, Proceedings of the IEEE.

[9]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[10]  Ajay N. Jain,et al.  Genomic and transcriptional aberrations linked to breast cancer pathophysiologies. , 2006, Cancer cell.

[11]  S. M. Robinson Local structure of feasible sets in nonlinear programming , 1983 .

[12]  D. Botstein,et al.  Singular value decomposition for genome-wide expression data processing and modeling. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[13]  Shai Avidan,et al.  Spectral Bounds for Sparse PCA: Exact and Greedy Algorithms , 2005, NIPS.

[14]  J. Borwein,et al.  Two-Point Step Size Gradient Methods , 1988 .

[15]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[16]  Ash A. Alizadeh,et al.  'Gene shaving' as a method for identifying distinct sets of genes with similar expression patterns , 2000, Genome Biology.

[17]  J. N. R. Jeffers,et al.  Two Case Studies in the Application of Principal Component Analysis , 1967 .

[18]  V. Bruce,et al.  Face processing: Human perception and principal components analysis , 1996, Memory & cognition.

[19]  Y. Nesterov Gradient methods for minimizing composite objective function , 2007 .

[20]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[21]  S. M. Robinson Stability Theory for Systems of Inequalities, Part II: Differentiable Nonlinear Systems , 1976 .

[22]  Ying Xiong Nonlinear Optimization , 2014 .

[23]  José Mario Martínez,et al.  Nonmonotone Spectral Projected Gradient Methods on Convex Sets , 1999, SIAM J. Optim..

[24]  J. Hiriart-Urruty,et al.  Convex analysis and minimization algorithms , 1993 .

[25]  I. Jolliffe Rotation of principal components: choice of normalization constraints , 1995 .

[26]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[27]  Jianhua Z. Huang,et al.  Sparse principal component analysis via regularized low rank matrix approximation , 2008 .

[28]  Jorge Cadima Departamento de Matematica Loading and correlations in the interpretation of principle compenents , 1995 .

[29]  Stephen J. Wright,et al.  Sparse Reconstruction by Separable Approximation , 2008, IEEE Transactions on Signal Processing.

[30]  I. Jolliffe,et al.  A Modified Principal Component Technique Based on the LASSO , 2003 .

[31]  Michael L. Overton,et al.  Optimality conditions and duality theory for minimizing sums of the largest eigenvalues of symmetric matrices , 2015, Math. Program..