Tight convex relaxations for sparse matrix factorization

Based on a new atomic norm, we propose a new convex formulation for sparse matrix factorization problems in which the number of non-zero elements of the factors is assumed fixed and known. The formulation counts sparse PCA with multiple factors, subspace clustering and low-rank sparse bilinear regression as potential applications. We compute slow rates and an upper bound on the statistical dimension [1] of the suggested norm for rank 1 matrices, showing that its statistical dimension is an order of magnitude smaller than the usual l1-norm, trace norm and their combinations. Even though our convex formulation is in theory hard and does not lead to provably polynomial time algorithmic schemes, we propose an active set algorithm leveraging the structure of the convex problem to solve it and show promising numerical results.

[1]  Christos Thrampoulidis,et al.  The squared-error of generalized LASSO: A precise analysis , 2013, 2013 51st Annual Allerton Conference on Communication, Control, and Computing (Allerton).

[2]  Mark Jerrum,et al.  Large Cliques Elude the Metropolis Process , 1992, Random Struct. Algorithms.

[3]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[4]  Shai Avidan,et al.  Spectral Bounds for Sparse PCA: Exact and Greedy Algorithms , 2005, NIPS.

[5]  Joel A. Tropp,et al.  Living on the edge: A geometric theory of phase transitions in convex optimization , 2013, ArXiv.

[6]  R. Tibshirani,et al.  Sparse Principal Component Analysis , 2006 .

[7]  Paul Tseng,et al.  A coordinate gradient descent method for nonsmooth separable minimization , 2008, Math. Program..

[8]  B. Moghaddam,et al.  Sparse regression as a sparse eigenvalue problem , 2008, 2008 Information Theory and Applications Workshop.

[9]  Nathan Srebro,et al.  Sparse Prediction with the $k$-Support Norm , 2012, NIPS.

[10]  Pablo A. Parrilo,et al.  The Convex Geometry of Linear Inverse Problems , 2010, Foundations of Computational Mathematics.

[11]  Marc Teboulle,et al.  Conditional Gradient Algorithmsfor Rank-One Matrix Approximations with a Sparsity Constraint , 2011, SIAM Rev..

[12]  Nicolas Vayatis,et al.  Estimation of Simultaneously Sparse and Low Rank Matrices , 2012, ICML.

[13]  Jean Ponce,et al.  Convex Sparse Matrix Factorizations , 2008, ArXiv.

[14]  Lester W. Mackey,et al.  Deflation Methods for Sparse PCA , 2008, NIPS.

[15]  Rina Foygel,et al.  Corrupted Sensing: Novel Guarantees for Separating Structured Signals , 2013, IEEE Transactions on Information Theory.

[16]  Xiao-Tong Yuan,et al.  Truncated power method for sparse eigenvalue problems , 2011, J. Mach. Learn. Res..

[17]  Roman Vershynin,et al.  Introduction to the non-asymptotic analysis of random matrices , 2010, Compressed Sensing.

[18]  Xuan Vinh Doan,et al.  Finding approximately rank-one submatrices with the nuclear norm and l1 norm , 2010, 1011.1839.

[19]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[20]  Babak Hassibi,et al.  Asymptotically Exact Denoising in Relation to Compressed Sensing , 2013, ArXiv.

[21]  Stéphane Gaïffas,et al.  Link prediction in graphs with autoregressive features , 2012, J. Mach. Learn. Res..

[22]  Yonina C. Eldar,et al.  Simultaneously Structured Models With Application to Sparse and Low-Rank Matrices , 2012, IEEE Transactions on Information Theory.

[23]  R. Bhatia Matrix Analysis , 1996 .

[24]  V. Koltchinskii,et al.  Nuclear norm penalization and optimal rates for noisy low rank matrix completion , 2010, 1011.6256.

[25]  Julien Mairal,et al.  Structured sparsity through convex optimization , 2011, ArXiv.

[26]  G. Watson Characterization of the subdifferential of some matrix norms , 1992 .

[27]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[28]  Alexandre d'Aspremont,et al.  Optimal Solutions for Sparse Principal Component Analysis , 2007, J. Mach. Learn. Res..

[29]  J. T. Chu On bounds for the normal integral , 1955 .

[30]  Martin J. Wainwright,et al.  Information-Theoretic Limits on Sparsity Recovery in the High-Dimensional and Noisy Setting , 2007, IEEE Transactions on Information Theory.

[31]  B. Nadler,et al.  Do Semidefinite Relaxations Really Solve Sparse PCA , 2013 .

[32]  Jean-Philippe Vert,et al.  Group lasso with overlap and graph lasso , 2009, ICML '09.

[33]  Emmanuel J. Candès,et al.  PhaseLift: Exact and Stable Signal Recovery from Magnitude Measurements via Convex Programming , 2011, ArXiv.

[34]  Yurii Nesterov,et al.  Generalized Power Method for Sparse Principal Component Analysis , 2008, J. Mach. Learn. Res..

[35]  Michael I. Jordan,et al.  A Direct Formulation for Sparse Pca Using Semidefinite Programming , 2004, SIAM Rev..

[36]  Martin J. Wainwright,et al.  A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers , 2009, NIPS.

[37]  Julien Mairal,et al.  Optimization with Sparsity-Inducing Penalties , 2011, Found. Trends Mach. Learn..

[38]  Francis R. Bach,et al.  Intersecting singularities for multi-structured estimation , 2013, ICML.

[39]  Emmanuel J. Candès,et al.  How well can we estimate a sparse vector? , 2011, ArXiv.

[40]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[41]  S. Szarek,et al.  Chapter 8 - Local Operator Theory, Random Matrices and Banach Spaces , 2001 .

[42]  M. Wainwright,et al.  High-dimensional analysis of semidefinite relaxations for sparse principal components , 2008, 2008 IEEE International Symposium on Information Theory.

[43]  Francis R. Bach,et al.  Convex relaxations of structured matrix factorizations , 2013, ArXiv.

[44]  Michael I. Jordan,et al.  Computational and statistical tradeoffs via convex relaxation , 2012, Proceedings of the National Academy of Sciences.

[45]  G. Jameson Summing and nuclear norms in Banach space theory , 1987 .

[46]  Philippe Rigollet,et al.  Complexity Theoretic Lower Bounds for Sparse Principal Component Detection , 2013, COLT.

[47]  Joel A. Tropp,et al.  Living on the edge: phase transitions in convex programs with random data , 2013, 1303.6672.