Generalized Power Method for Sparse Principal Component Analysis

In this paper we develop a new approach to sparse principal component analysis (sparse PCA). We propose two single-unit and two block optimization formulations of the sparse PCA problem, aimed at extracting a single sparse dominant principal component of a data matrix, or more components at once, respectively. While the initial formulations involve nonconvex functions, and are therefore computationally intractable, we rewrite them into the form of an optimization program involving maximization of a convex function on a compact set. The dimension of the search space is decreased enormously if the data matrix has many more columns (variables) than rows. We then propose and analyze a simple gradient method suited for the task. It appears that our algorithm has best convergence properties in the case when either the objective function or the feasible set are strongly convex, which is the case with our single-unit formulations and can be enforced in the block case. Finally, we demonstrate numerically on a set of random and gene expression test problems that our approach outperforms existing algorithms both in quality of the obtained solution and in computational speed.

[1]  B. Parlett The Symmetric Eigenvalue Problem , 1981 .

[2]  Charles R. Johnson,et al.  Matrix analysis , 1985, Statistical Inference for Engineers and Data Scientists.

[3]  R. Brockett,et al.  Dynamical systems that sort lists, diagonalize matrices and solve linear programming problems , 1988, Proceedings of the 27th IEEE Conference on Decision and Control.

[4]  Jorge Cadima Departamento de Matematica Loading and correlations in the interpretation of principle compenents , 1995 .

[5]  I. Jolliffe Rotation of principal components: choice of normalization constraints , 1995 .

[6]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[7]  Christian Gourieroux,et al.  Simulation-based econometric methods , 1996 .

[8]  James Renegar,et al.  A mathematical view of interior-point methods in convex optimization , 2001, MPS-SIAM series on optimization.

[9]  Wolfram Liebermeister,et al.  Linear modes of gene expression determined by independent component analysis , 2002, Bioinform..

[10]  Daniel Bienstock,et al.  Potential Function Methods for Approximately Solving Linear Programming Problems: Theory and Practice , 2002 .

[11]  Van,et al.  A gene-expression signature as a predictor of survival in breast cancer. , 2002, The New England journal of medicine.

[12]  D. Botstein,et al.  Generalized singular value decomposition for comparative analysis of genome-scale expression data sets of two different organisms , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[13]  I. Jolliffe,et al.  A Modified Principal Component Technique Based on the LASSO , 2003 .

[14]  Jean-Charles Régin,et al.  Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems , 2004, Lecture Notes in Computer Science.

[15]  Pablo Tamayo,et al.  Metagenes and molecular pattern discovery using matrix factorization , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Shai Avidan,et al.  Spectral Bounds for Sparse PCA: Exact and Greedy Algorithms , 2005, NIPS.

[17]  J. Foekens,et al.  Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer , 2005, The Lancet.

[18]  Bruno Torrésani,et al.  Comments on selected fundamental aspects of microarray analysis , 2005, Comput. Biol. Chem..

[19]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[20]  J. Drèze,et al.  Public goods, environmental externalities and fiscal competition , 2006 .

[21]  J. Gabszewicz La différenciation des produits , 2006 .

[22]  M. J. van de Vijver,et al.  Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. , 2006, Journal of the National Cancer Institute.

[23]  V. Ginsburgh,et al.  Handbook of the Economics of the Art and Culture , 2006 .

[24]  P. Pestieau The Welfare State in the European Union: Economic and Social Perspectives , 2006 .

[25]  I. Ellis,et al.  A gene-expression signature to predict survival in breast cancer across independent data sets , 2007, Oncogene.

[26]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[27]  Pierre-Antoine Absil,et al.  Elucidating the Altered Transcriptional Programs in Breast Cancer using Independent Component Analysis , 2007, PLoS Comput. Biol..

[28]  Laurence A. Wolsey,et al.  Integration of AI and OR Techniques in Constraint Programming for Combinatorial Optimization Problems, 4th International Conference, CPAIOR 2007, Brussels, Belgium, May 23-26, 2007, Proceedings , 2007, CPAIOR.

[29]  W. Pohlmeier,et al.  High frequency financial econometrics : recent developments , 2007 .

[30]  Alexandre d'Aspremont,et al.  Optimal Solutions for Sparse Principal Component Analysis , 2007, J. Mach. Learn. Res..

[31]  Y. Nesterov,et al.  A gradient-type algorithm optimizing the coupling between matrices , 2008 .

[32]  Jean Jaskold Gabszewicz,et al.  The TV news scheduling game when the newcaster's face matters , 2008 .

[33]  Jianhua Z. Huang,et al.  Sparse principal component analysis via regularized low rank matrix approximation , 2008 .

[34]  J. Tharakan,et al.  On the Impact of Labor Market Matching on Regional Disparities , 2009 .

[35]  Nicolas Gillis,et al.  Nonnegative Factorization and The Maximum Edge Biclique Problem , 2008, 0810.4225.

[36]  Roderick McCrorie,et al.  The role of Skorokhod space in the development of the econometric analysis of time series , 2008 .

[37]  J. Rombouts,et al.  Asymptotic properties of the Bernstein density copula for dependent data , 2008 .

[38]  See Balas A Note on Split Rank of Intersection Cuts , 2008 .

[39]  Raouf Boucekkine,et al.  Estimating the Dynamics of R&D-based Growth Models * , 2008 .

[40]  J. Resende,et al.  Does the absence of competition in the market foster competition for the market? A dynamic approach to aftermarkets , 2008 .

[41]  G. Ponthiere,et al.  Optimal tax policy and expected longevity: a mean and variance approach , 2008 .

[42]  G. Oggioni,et al.  Average power contracts can mitigate carbon leakage , 2008 .

[43]  David de la Croix,et al.  Adult Longevity and Economic Take-off: from Malthus to Ben-Porath , 2008 .

[44]  C. Hsiao,et al.  An easy test for two stationary long processes being uncorrelated via AR approximations , 2008 .

[45]  Lester W. Mackey,et al.  Deflation Methods for Sparse PCA , 2008, NIPS.

[46]  Gregor Zoettl On investment decisions in liberalized electricity markets: the impact of price caps at the spot market , 2008 .

[47]  P. Pestieau,et al.  Habit Formation and Labor Supply , 2008, SSRN Electronic Journal.

[48]  D. Peeters,et al.  Space–Time Patterns of Urban Sprawl, a 1D Cellular Automata and Microeconomic Approach , 2009 .

[49]  Vincent D. Blondel,et al.  Polynomial-Time Computation of the Joint Spectral Radius for Some Sets of Nonnegative Matrices , 2009, SIAM J. Matrix Anal. Appl..

[50]  R. Boucekkine,et al.  How do epidemics induce behavioral changes? , 2009 .

[51]  P. Pestieau,et al.  Should We Subsidize Longevity? , 2009, SSRN Electronic Journal.

[52]  Levent Tunçel,et al.  Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[53]  Gregor Zöttl,et al.  A Framework of Peak Load Pricing with Strategic Firms , 2008, Oper. Res..

[54]  Laurence A. Wolsey,et al.  Production Planning by Mixed Integer Programming , 2010 .

[55]  Gregory Ponthiere,et al.  On the Golden Rule of capital accumulation under endogenous longevity , 2010, Math. Soc. Sci..

[56]  S. Zanaj Successive Oligopolies and Decreasing Returns , 2010 .

[57]  Natali Hritonenko,et al.  Discrete-continuous analysis of optimal equipment replacement , 2010, Int. Trans. Oper. Res..

[58]  RichtárikPeter,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2010 .

[59]  J. Krawczyk,et al.  Towards an understanding of tradeoffs between regional wealth, tightness of a common environmental constraint and the sharing rules , 2010 .

[60]  Henry Tulkens,et al.  The impact of the unilateral EU commitment on the stability of international climate agreements , 2010 .

[61]  David de la Croix,et al.  Would empowering women initiate the demographic transition in least-developed countries? , 2010 .

[62]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[63]  Yurii Nesterov,et al.  Barrier subgradient method , 2011, Math. Program..

[64]  K. Behrens,et al.  Transportation, freight rates, and economic geography , 2011 .

[65]  P. Pestieau,et al.  Optimal linear taxation under endogenous longevity , 2011 .

[66]  S. Pekarski Budget deficits and inflation feedback , 2011 .

[67]  David de la Croix,et al.  Democracy, rule of law, corruption incentives, and growth , 2011 .

[68]  R. Luttens,et al.  Voting for Redistribution Under Desert‐Sensitive Altruism , 2012 .

[69]  J. Gabszewicz,et al.  On Gale and Shapley "college admissions and the stability of marriage" , 2012 .