The nonconvex geometry of low-rank matrix optimizations with general objective functions

This work considers the minimization of a general convex function f (X) over the cone of positive semi-definite matrices whose optimal solution X∗ is of low-rank. Standard first-order convex solvers require performing an eigenvalue decomposition in each iteration, severely limiting their scalability. A natural nonconvex reformulation of the problem factors the variable X into the product of a rectangular matrix with fewer columns and its transpose. For a special class of matrix sensing and completion problems with quadratic objective functions, local search algorithms applied to the factored problem have been shown to be much more efficient and, in spite of being nonconvex, to converge to the global optimum. The purpose of this work is to extend this line of study to general convex objective functions f (X) and investigate the geometry of the resulting factored formulations. Specifically, we prove that when f (X) satisfies the restricted well-conditioned assumption, each critical point of the factored problem either corresponds to the optimal solution X∗ or a strict saddle where the Hessian matrix has a strictly negative eigenvalue. Such a geometric structure of the factored formulation ensures that many local search algorithms can converge to the global optimum with random initializations.

[1]  Sanjeev Arora,et al.  New Algorithms for Learning Incoherent and Overcomplete Dictionaries , 2013, COLT.

[2]  Emmanuel J. Candès,et al.  Matrix Completion With Noise , 2009, Proceedings of the IEEE.

[3]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[4]  Sham M. Kakade,et al.  Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent , 2016, NIPS.

[5]  Stephen P. Boyd,et al.  Generalized Low Rank Models , 2014, Found. Trends Mach. Learn..

[6]  Yonina C. Eldar,et al.  Phase Retrieval via Matrix Completion , 2011, SIAM Rev..

[7]  Lixin Shen,et al.  Overcomplete tensor decomposition via convex optimization , 2015, 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[8]  J. Salmon,et al.  Poisson noise reduction with non-local PCA , 2012, ICASSP.

[9]  Robert E. Mahony,et al.  Optimization Algorithms on Matrix Manifolds , 2007 .

[10]  John Wright,et al.  Complete Dictionary Recovery Over the Sphere II: Recovery by Riemannian Trust-Region Method , 2015, IEEE Transactions on Information Theory.

[11]  Yuxin Chen,et al.  Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems , 2015, NIPS.

[12]  Nicolas Boumal,et al.  Nonconvex Phase Synchronization , 2016, SIAM J. Optim..

[13]  Junwei Lu,et al.  Symmetry, Saddle Points, and Global Geometry of Nonconvex Matrix Factorization , 2016, ArXiv.

[14]  Zuowei Shen,et al.  L0 Norm Based Dictionary Learning by Proximal Methods with Global Convergence , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Martin J. Wainwright,et al.  Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees , 2015, ArXiv.

[16]  Renato D. C. Monteiro,et al.  A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization , 2003, Math. Program..

[17]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[18]  E. Candès The restricted isometry property and its implications for compressed sensing , 2008 .

[19]  Zhihui Zhu,et al.  Global Optimality in Low-Rank Matrix Optimization , 2017, IEEE Transactions on Signal Processing.

[20]  Nicolas Gillis,et al.  Low-Rank Matrix Approximation with Weights or Missing Data Is NP-Hard , 2010, SIAM J. Matrix Anal. Appl..

[21]  M. Fazel,et al.  Reweighted nuclear norm minimization with application to system identification , 2010, Proceedings of the 2010 American Control Conference.

[22]  Prateek Jain,et al.  Provable Tensor Factorization with Missing Data , 2014, NIPS.

[23]  Sanjeev Arora,et al.  Simple, Efficient, and Neural Algorithms for Sparse Coding , 2015, COLT.

[24]  Alexander J. Smola,et al.  Maximum Margin Matrix Factorization for Collaborative Ranking , 2007 .

[25]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[26]  Emmanuel J. Candès,et al.  Tight Oracle Inequalities for Low-Rank Matrix Recovery From a Minimal Number of Noisy Random Measurements , 2011, IEEE Transactions on Information Theory.

[27]  Stephen Becker,et al.  Quantum state tomography via compressed sensing. , 2009, Physical review letters.

[28]  Dennis DeCoste,et al.  Collaborative prediction using ensembles of Maximum Margin Matrix Factorizations , 2006, ICML.

[29]  Ewout van den Berg,et al.  1-Bit Matrix Completion , 2012, ArXiv.

[30]  Anima Anandkumar,et al.  Analyzing Tensor Power Method Dynamics: Applications to Learning Overcomplete Latent Variable Models , 2014, ArXiv.

[31]  Anastasios Kyrillidis,et al.  Finding Low-rank Solutions to Matrix Problems, Efficiently and Provably , 2016, SIAM J. Imaging Sci..

[32]  Zhihui Zhu,et al.  The Global Optimization Geometry of Nonsymmetric Matrix Factorization and Sensing , 2017, ArXiv.

[33]  Eduardo D. Sontag,et al.  Backpropagation Can Give Rise to Spurious Local Minima Even for Networks without Hidden Layers , 1989, Complex Syst..

[34]  Nicolas Boumal,et al.  The non-convex Burer-Monteiro approach works on smooth semidefinite programs , 2016, NIPS.

[35]  El-hadi Zahzah,et al.  Handbook of Robust Low-Rank and Sparse Matrix Decomposition: Applications in Image and Video Processing , 2016 .

[36]  John D. Lafferty,et al.  A Convergent Gradient Descent Algorithm for Rank Minimization and Semidefinite Programming from Random Linear Measurements , 2015, NIPS.

[37]  Zhi-Quan Luo,et al.  Guaranteed Matrix Completion via Non-Convex Factorization , 2014, IEEE Transactions on Information Theory.

[38]  Yoram Bresler,et al.  Near Optimal Compressed Sensing of Sparse Rank-One Matrices via Sparse Power Factorization , 2013, ArXiv.

[39]  Prateek Jain,et al.  Computing Matrix Squareroot via Non Convex Local Search , 2015, ArXiv.

[40]  Surya Ganguli,et al.  Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[41]  Matthijs Douze,et al.  Large-scale image classification with trace-norm regularization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Alexandre d'Aspremont,et al.  Phase recovery, MaxCut and complex semidefinite programming , 2012, Math. Program..

[43]  Nathan Srebro,et al.  Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.

[44]  Tselil Schramm,et al.  Speeding up sum-of-squares for tensor decomposition and planted sparse vectors , 2015, ArXiv.

[45]  John Wright,et al.  Finding a Sparse Vector in a Subspace: Linear Sparsity Using Alternating Directions , 2014, IEEE Transactions on Information Theory.

[46]  John Wright,et al.  When Are Nonconvex Problems Not Scary? , 2015, ArXiv.

[47]  Kim-Chuan Toh,et al.  Semidefinite Programming Approaches for Sensor Network Localization With Noisy Distance Measurements , 2006, IEEE Transactions on Automation Science and Engineering.

[48]  Zhaoran Wang,et al.  A Nonconvex Optimization Framework for Low Rank Matrix Estimation , 2015, NIPS.

[49]  Anastasios Kyrillidis,et al.  Dropping Convexity for Faster Semi-definite Optimization , 2015, COLT.

[50]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[51]  Martin J. Wainwright,et al.  Fast global convergence rates of gradient methods for high-dimensional statistical recovery , 2010, NIPS.

[52]  Xiao Zhang,et al.  A Unified Computational and Statistical Framework for Nonconvex Low-rank Matrix Estimation , 2016, AISTATS.

[53]  Tengyu Ma,et al.  Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[54]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[55]  Gongguo Tang,et al.  Robust principal component analysis based on low-rank and block-sparse matrix decomposition , 2011, 2011 45th Annual Conference on Information Sciences and Systems.

[56]  Gongguo Tang,et al.  The nonconvex geometry of low-rank matrix optimizations with general objective functions , 2017, 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[57]  Martin J. Wainwright,et al.  Restricted strong convexity and weighted matrix completion: Optimal bounds with noise , 2010, J. Mach. Learn. Res..

[58]  Zhihui Zhu,et al.  Geometry of Factored Nuclear Norm Regularization , 2017, ArXiv.

[59]  Max Simchowitz,et al.  Low-rank Solutions of Linear Matrix Equations via Procrustes Flow , 2015, ICML.

[60]  Aditya Bhaskara,et al.  More Algorithms for Provable Dictionary Learning , 2014, ArXiv.

[61]  John D. Lafferty,et al.  Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro Factorization and Gradient Descent , 2016, ArXiv.

[62]  John Wright,et al.  A Geometric Analysis of Phase Retrieval , 2016, International Symposium on Information Theory.

[63]  Katta G. Murty,et al.  Some NP-complete problems in quadratic and nonlinear programming , 1987, Math. Program..

[64]  Scott Aaronson,et al.  The learnability of quantum states , 2006, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[65]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[66]  Prateek Jain,et al.  Non-convex Robust PCA , 2014, NIPS.

[67]  Christopher Ré,et al.  Parallel stochastic gradient algorithms for large-scale matrix completion , 2013, Mathematical Programming Computation.

[68]  Yi Zheng,et al.  No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.

[69]  Tapani Raiko,et al.  Binary principal component analysis in the Netflix collaborative filtering task , 2009, 2009 IEEE International Workshop on Machine Learning for Signal Processing.

[70]  Kiryung Lee,et al.  RIP-like Properties in Subsampled Blind Deconvolution , 2015, ArXiv.

[71]  Anastasios Kyrillidis,et al.  Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach , 2016, AISTATS.

[72]  Stephen J. Wright,et al.  Hogwild: A Lock-Free Approach to Parallelizing Stochastic Gradient Descent , 2011, NIPS.

[73]  Prateek Jain,et al.  Learning Sparsely Used Overcomplete Dictionaries , 2014, COLT.

[74]  Anima Anandkumar,et al.  Guaranteed Non-Orthogonal Tensor Decomposition via Alternating Rank-1 Updates , 2014, ArXiv.

[75]  Michael I. Jordan,et al.  Gradient Descent Converges to Minimizers , 2016, ArXiv.

[76]  Yinyu Ye,et al.  Semidefinite programming for ad hoc wireless sensor network localization , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[77]  Marc Teboulle,et al.  Proximal alternating linearized minimization for nonconvex and nonsmooth problems , 2013, Mathematical Programming.

[78]  Ju Sun,et al.  When Are Nonconvex Optimization Problems Not Scary? , 2016 .

[79]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[80]  Lei Zhang,et al.  Robust Principal Component Analysis with Complex Noise , 2014, ICML.