The nonconvex geometry of low-rank matrix optimizations with general objective functions

This work considers the minimization of a general convex function f (X) over the cone of positive semi-definite matrices whose optimal solution X∗ is of low-rank. Standard first-order convex solvers require performing an eigenvalue decomposition in each iteration, severely limiting their scalability. A natural nonconvex reformulation of the problem factors the variable X into the product of a rectangular matrix with fewer columns and its transpose. For a special class of matrix sensing and completion problems with quadratic objective functions, local search algorithms applied to the factored problem have been shown to be much more efficient and, in spite of being nonconvex, to converge to the global optimum. The purpose of this work is to extend this line of study to general convex objective functions f (X) and investigate the geometry of the resulting factored formulations. Specifically, we prove that when f (X) satisfies the restricted well-conditioned assumption, each critical point of the factored problem either corresponds to the optimal solution X∗ or a strict saddle where the Hessian matrix has a strictly negative eigenvalue. Such a geometric structure of the factored formulation ensures that many local search algorithms can converge to the global optimum with random initializations.

[1]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[2]  Sujay Sanghavi,et al.  The Local Convexity of Solving Systems of Quadratic Equations , 2015, 1506.07868.

[3]  Matthijs Douze,et al.  Large-scale image classification with trace-norm regularization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Zhi-Quan Luo,et al.  Guaranteed Matrix Completion via Non-Convex Factorization , 2014, IEEE Transactions on Information Theory.

[5]  Alexandre d'Aspremont,et al.  Phase recovery, MaxCut and complex semidefinite programming , 2012, Math. Program..

[6]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2008, Found. Comput. Math..

[7]  Yinyu Ye,et al.  Semidefinite programming for ad hoc wireless sensor network localization , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[8]  Ju Sun,et al.  When Are Nonconvex Optimization Problems Not Scary? , 2016 .

[9]  Rebecca Willett,et al.  Poisson Noise Reduction with Non-local PCA , 2012, Journal of Mathematical Imaging and Vision.

[10]  John Wright,et al.  When Are Nonconvex Problems Not Scary? , 2015, ArXiv.

[11]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[12]  Tapani Raiko,et al.  Binary principal component analysis in the Netflix collaborative filtering task , 2009, 2009 IEEE International Workshop on Machine Learning for Signal Processing.

[13]  Yi Zheng,et al.  No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.

[14]  Yonina C. Eldar,et al.  Phase Retrieval via Matrix Completion , 2011, SIAM Rev..

[15]  John Wright,et al.  Complete Dictionary Recovery Over the Sphere II: Recovery by Riemannian Trust-Region Method , 2015, IEEE Transactions on Information Theory.

[16]  Emmanuel J. Candès,et al.  Tight Oracle Inequalities for Low-Rank Matrix Recovery From a Minimal Number of Noisy Random Measurements , 2011, IEEE Transactions on Information Theory.

[17]  Nathan Srebro,et al.  Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.

[18]  Stephen Becker,et al.  Quantum state tomography via compressed sensing. , 2009, Physical review letters.

[19]  Levent Tunçel,et al.  Optimization algorithms on matrix manifolds , 2009, Math. Comput..

[20]  Kiryung Lee,et al.  RIP-like Properties in Subsampled Blind Deconvolution , 2015, ArXiv.

[21]  Lixin Shen,et al.  Overcomplete tensor decomposition via convex optimization , 2015, 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[22]  Yoram Bresler,et al.  Near Optimal Compressed Sensing of Sparse Rank-One Matrices via Sparse Power Factorization , 2013, ArXiv.

[23]  Gongguo Tang,et al.  The nonconvex geometry of low-rank matrix optimizations with general objective functions , 2016, 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[24]  Tselil Schramm,et al.  Speeding up sum-of-squares for tensor decomposition and planted sparse vectors , 2015, ArXiv.

[25]  John Wright,et al.  Finding a Sparse Vector in a Subspace: Linear Sparsity Using Alternating Directions , 2014, IEEE Transactions on Information Theory.

[26]  Renato D. C. Monteiro,et al.  A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization , 2003, Math. Program..

[27]  Kim-Chuan Toh,et al.  Semidefinite Programming Approaches for Sensor Network Localization With Noisy Distance Measurements , 2006, IEEE Transactions on Automation Science and Engineering.

[28]  Zhihui Zhu,et al.  Global Optimality in Low-Rank Matrix Optimization , 2017, IEEE Transactions on Signal Processing.

[29]  Lei Zhang,et al.  Robust Principal Component Analysis with Complex Noise , 2014, ICML.

[30]  Nicolas Gillis,et al.  Low-Rank Matrix Approximation with Weights or Missing Data Is NP-Hard , 2010, SIAM J. Matrix Anal. Appl..

[31]  Martin J. Wainwright,et al.  Restricted strong convexity and weighted matrix completion: Optimal bounds with noise , 2010, J. Mach. Learn. Res..

[32]  M. Fazel,et al.  Reweighted nuclear norm minimization with application to system identification , 2010, Proceedings of the 2010 American Control Conference.

[33]  Prateek Jain,et al.  Provable Tensor Factorization with Missing Data , 2014, NIPS.

[34]  Sanjeev Arora,et al.  Simple, Efficient, and Neural Algorithms for Sparse Coding , 2015, COLT.

[35]  Eduardo D. Sontag,et al.  Backpropagation Can Give Rise to Spurious Local Minima Even for Networks without Hidden Layers , 1989, Complex Syst..

[36]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[37]  Nathan Srebro,et al.  Fast maximum margin matrix factorization for collaborative prediction , 2005, ICML.

[38]  Katta G. Murty,et al.  Some NP-complete problems in quadratic and nonlinear programming , 1987, Math. Program..

[39]  Yuxin Chen,et al.  Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems , 2015, NIPS.

[40]  Scott Aaronson,et al.  The learnability of quantum states , 2006, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[41]  John D. Lafferty,et al.  Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro Factorization and Gradient Descent , 2016, ArXiv.

[42]  Tengyu Ma,et al.  Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[43]  Zhaoran Wang,et al.  A Nonconvex Optimization Framework for Low Rank Matrix Estimation , 2015, NIPS.

[44]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[45]  Anastasios Kyrillidis,et al.  Dropping Convexity for Faster Semi-definite Optimization , 2015, COLT.

[46]  Dennis DeCoste,et al.  Collaborative prediction using ensembles of Maximum Margin Matrix Factorizations , 2006, ICML.

[47]  Prateek Jain,et al.  Computing Matrix Squareroot via Non Convex Local Search , 2015, ArXiv.

[48]  Sham M. Kakade,et al.  Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent , 2016, NIPS.

[49]  S. Sastry,et al.  Compressive Phase Retrieval From Squared Output Measurements Via Semidefinite Programming , 2011, 1111.6323.

[50]  Martin J. Wainwright,et al.  Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees , 2015, ArXiv.

[51]  Zhihui Zhu,et al.  Geometry of Factored Nuclear Norm Regularization , 2017, ArXiv.

[52]  Surya Ganguli,et al.  Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[53]  Nicolas Boumal,et al.  The non-convex Burer-Monteiro approach works on smooth semidefinite programs , 2016, NIPS.

[54]  El-hadi Zahzah,et al.  Handbook of Robust Low-Rank and Sparse Matrix Decomposition: Applications in Image and Video Processing , 2016 .

[55]  Martin J. Wainwright,et al.  Fast global convergence rates of gradient methods for high-dimensional statistical recovery , 2010, NIPS.

[56]  John Wright,et al.  A Geometric Analysis of Phase Retrieval , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[57]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[58]  Prateek Jain,et al.  Learning Sparsely Used Overcomplete Dictionaries , 2014, COLT.

[59]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[60]  Max Simchowitz,et al.  Low-rank Solutions of Linear Matrix Equations via Procrustes Flow , 2015, ICML.

[61]  Aditya Bhaskara,et al.  More Algorithms for Provable Dictionary Learning , 2014, ArXiv.

[62]  Sanjeev Arora,et al.  New Algorithms for Learning Incoherent and Overcomplete Dictionaries , 2013, COLT.

[63]  Emmanuel J. Candès,et al.  Matrix Completion With Noise , 2009, Proceedings of the IEEE.

[64]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[65]  Ewout van den Berg,et al.  1-Bit Matrix Completion , 2012, ArXiv.

[66]  Anima Anandkumar,et al.  Analyzing Tensor Power Method Dynamics: Applications to Learning Overcomplete Latent Variable Models , 2014, ArXiv.

[67]  Zhihui Zhu,et al.  The Global Optimization Geometry of Nonsymmetric Matrix Factorization and Sensing , 2017, ArXiv.

[68]  Junwei Lu,et al.  Symmetry, Saddle Points, and Global Geometry of Nonconvex Matrix Factorization , 2016, ArXiv.

[69]  Zuowei Shen,et al.  L0 Norm Based Dictionary Learning by Proximal Methods with Global Convergence , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[70]  Stephen P. Boyd,et al.  Generalized Low Rank Models , 2014, Found. Trends Mach. Learn..

[71]  Anima Anandkumar,et al.  Guaranteed Non-Orthogonal Tensor Decomposition via Alternating Rank-1 Updates , 2014, ArXiv.

[72]  Michael I. Jordan,et al.  Gradient Descent Converges to Minimizers , 2016, ArXiv.

[73]  Marc Teboulle,et al.  Proximal alternating linearized minimization for nonconvex and nonsmooth problems , 2013, Mathematical Programming.