Dictionary Learning and Tensor Decomposition via the Sum-of-Squares Method

We give a new approach to the dictionary learning (also known as "sparse coding") problem of recovering an unknown n x m matrix A (for m ≥ n) from examples of the form [y = Ax + e,] where x is a random vector in Rm with at most τ m nonzero coordinates, and e is a random noise vector in Rn with bounded magnitude. For the case m=O(n), our algorithm recovers every column of A within arbitrarily good constant accuracy in time mO(log m/log(τ-1)), in particular achieving polynomial time if τ = m-δ for any δ>0, and time mO(log m) if τ is (a sufficiently small) constant. Prior algorithms with comparable assumptions on the distribution required the vector $x$ to be much sparser---at most √n nonzero coordinates---and there were intrinsic barriers preventing these algorithms from applying for denser x. We achieve this by designing an algorithm for noisy tensor decomposition that can recover, under quite general conditions, an approximate rank-one decomposition of a tensor T, given access to a tensor T' that is τ-close to T in the spectral norm (when considered as a matrix). To our knowledge, this is the first algorithm for tensor decomposition that works in the constant spectral-norm noise regime, where there is no guarantee that the local optima of T and T' have similar structures. Our algorithm is based on a novel approach to using and analyzing the Sum of Squares semidefinite programming hierarchy (Parrilo 2000, Lasserre 2001), and it can be viewed as an indication of the utility of this very general and powerful tool for unsupervised learning problems.

[1]  A. Hurwitz Ueber den Vergleich des arithmetischen und des geometrischen Mittels. , 1891 .

[2]  L. Tucker,et al.  Some mathematical notes on three-mode factor analysis , 1966, Psychometrika.

[3]  Richard A. Harshman,et al.  Foundations of the PARAFAC procedure: Models and conditions for an "explanatory" multi-model factor analysis , 1970 .

[4]  J. Kruskal Three-way arrays: rank and uniqueness of trilinear decompositions, with application to arithmetic complexity and statistics , 1977 .

[5]  B. Reznick A quantitative version of Hurwitz' theorem on the arithmetic-geometric inequality. , 1987 .

[6]  N. Z. Shor An approach to obtaining global extremums in polynomial mathematical programming problems , 1987 .

[7]  B. Reznick Forms derived from the arithmetic-geometric inequality , 1989 .

[8]  Pierre Comon,et al.  Independent component analysis, A new concept? , 1994, Signal Process..

[9]  David J. Field,et al.  Emergence of simple-cell receptive field properties by learning a sparse code for natural images , 1996, Nature.

[10]  Alan M. Frieze,et al.  Learning linear transformations , 1996, Proceedings of 37th Conference on Foundations of Computer Science.

[11]  D. Field,et al.  Natural image statistics and efficient coding. , 1996, Network.

[12]  David J. Field,et al.  Sparse coding with an overcomplete basis set: A strategy employed by V1? , 1997, Vision Research.

[13]  F. Barthe On a reverse form of the Brascamp-Lieb inequality , 1997, math/9705210.

[14]  Yurii Nesterov,et al.  Squared Functional Systems and Optimization Problems , 2000 .

[15]  P. Parrilo Structured semidefinite programs and semialgebraic geometry methods in robustness and optimization , 2000 .

[16]  Dima Grigoriev,et al.  Linear lower bound on degrees of Positivstellensatz calculus proofs for the parity , 2001, Theor. Comput. Sci..

[17]  Jean B. Lasserre,et al.  Global Optimization with Polynomials and the Problem of Moments , 2000, SIAM J. Optim..

[18]  K. Roberts,et al.  Thesis , 2002 .

[19]  Jürgen Forster A linear lower bound on the unbounded error probabilistic communication complexity , 2002, J. Comput. Syst. Sci..

[20]  P. Parrilo,et al.  Distinguishing separable and entangled states. , 2001, Physical review letters.

[21]  P. Parrilo,et al.  Symmetry groups, semidefinite programs, and sums of squares , 2002, math/0211450.

[22]  A. Garulli,et al.  Positive Polynomials in Control , 2005 .

[23]  E. Candès,et al.  Stable signal recovery from incomplete and inaccurate measurements , 2005, math/0503066.

[24]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[25]  Michael Elad,et al.  Image Denoising Via Sparse and Redundant Representations Over Learned Dictionaries , 2006, IEEE Transactions on Image Processing.

[26]  P.A. Parrilo,et al.  Polynomial games and sum of squares optimization , 2006, Proceedings of the 45th IEEE Conference on Decision and Control.

[27]  Lieven De Lathauwer,et al.  Fourth-Order Cumulant-Based Blind Identification of Underdetermined Mixtures , 2007, IEEE Transactions on Signal Processing.

[28]  Marc'Aurelio Ranzato,et al.  Sparse Feature Learning for Deep Belief Networks , 2007, NIPS.

[29]  John Harrison,et al.  Verifying Nonlinear Real Formulas Via Sums of Squares , 2007, TPHOLs.

[30]  Phong Q. Nguyen,et al.  Learning a Parallelepiped: Cryptanalysis of GGH and NTRU Signatures , 2009, Journal of Cryptology.

[31]  Thomas S. Huang,et al.  Image super-resolution as sparse representation of raw image patches , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Grant Schoenebeck,et al.  Linear Level Lasserre Lower Bounds for Certain k-CSPs , 2008, 2008 49th Annual IEEE Symposium on Foundations of Computer Science.

[33]  Martial Hebert,et al.  Discriminative Sparse Image Models for Class-Specific Edge Detection and Image Interpretation , 2008, ECCV.

[34]  Jonah Sherman,et al.  Breaking the Multicommodity Flow Barrier for O(vlog n)-Approximations to Sparsest Cut , 2009, 2009 50th Annual IEEE Symposium on Foundations of Computer Science.

[35]  Ruslan Salakhutdinov,et al.  Practical Large-Scale Optimization for Max-norm Regularization , 2010, NIPS.

[36]  Andrea Montanari,et al.  The Noise-Sensitivity Phase Transition in Compressed Sensing , 2010, IEEE Transactions on Information Theory.

[37]  Yuan Zhou,et al.  Hypercontractivity, sum-of-squares proofs, and their applications , 2012, STOC '12.

[38]  P. Frenkel,et al.  Minkowski’s inequality and sums of squares , 2012, 1206.5783.

[39]  Huan Wang,et al.  Exact Recovery of Sparsely-Used Dictionaries , 2012, COLT.

[40]  Sanjeev Arora,et al.  Learning Topic Models -- Going beyond SVD , 2012, 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science.

[41]  Sanjeev Arora,et al.  A Practical Algorithm for Topic Modeling with Provable Guarantees , 2012, ICML.

[42]  L. Demanet,et al.  Recovering the Sparsest Element in a Subspace , 2013 .

[43]  Anima Anandkumar,et al.  Exact Recovery of Sparsely Used Overcomplete Dictionaries , 2013, ArXiv.

[44]  David Steurer,et al.  Rounding sum-of-squares relaxations , 2013, Electron. Colloquium Comput. Complex..

[45]  D. L. Donoho,et al.  Compressed sensing , 2006, IEEE Trans. Inf. Theory.

[46]  October 2013 , 2014, Leonardo.

[47]  Prateek Jain,et al.  Learning Sparsely Used Overcomplete Dictionaries , 2014, COLT.

[48]  Aditya Bhaskara,et al.  Smoothed analysis of tensor decompositions , 2013, STOC.

[49]  Aditya Bhaskara,et al.  More Algorithms for Provable Dictionary Learning , 2014, ArXiv.

[50]  Sanjeev Arora,et al.  New Algorithms for Learning Incoherent and Overcomplete Dictionaries , 2013, COLT.

[51]  David Steurer,et al.  Sum-of-squares proofs and the quest toward optimal algorithms , 2014, Electron. Colloquium Comput. Complex..

[52]  Santosh S. Vempala,et al.  Fourier PCA and robust tensor decomposition , 2013, STOC.

[53]  Anima Anandkumar,et al.  A Spectral Algorithm for Latent Dirichlet Allocation , 2012, Algorithmica.

[54]  Aditya Bhaskara,et al.  Uniqueness of Tensor Decompositions with Applications to Polynomial Identifiability , 2013, COLT.

[55]  Prateek Jain,et al.  Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization , 2013, SIAM J. Optim..