The non-convex geometry of low-rank matrix optimization

This work considers two popular minimization problems: (i) the minimization of a general convex function $f(\mathbf{X})$ with the domain being positive semi-definite matrices; (ii) the minimization of a general convex function $f(\mathbf{X})$ regularized by the matrix nuclear norm $\|\mathbf{X}\|_*$ with the domain being general matrices. Despite their optimal statistical performance in the literature, these two optimization problems have a high computational complexity even when solved using tailored fast convex solvers. To develop faster and more scalable algorithms, we follow the proposal of Burer and Monteiro to factor the low-rank variable $\mathbf{X} = \mathbf{U}\mathbf{U}^\top $ (for semi-definite matrices) or $\mathbf{X}=\mathbf{U}\mathbf{V}^\top $ (for general matrices) and also replace the nuclear norm $\|\mathbf{X}\|_*$ with $(\|\mathbf{U}\|_F^2+\|\mathbf{V}\|_F^2)/2$. In spite of the non-convexity of the resulting factored formulations, we prove that each critical point either corresponds to the global optimum of the original convex problems or is a strict saddle where the Hessian matrix has a strictly negative eigenvalue. Such a nice geometric structure of the factored formulations allows many local search algorithms to find a global optimizer even with random initializations.

[1]  P. Wolfe Convergence Conditions for Ascent Methods. II , 1969 .

[2]  Katta G. Murty,et al.  Some NP-complete problems in quadratic and nonlinear programming , 1987, Math. Program..

[3]  Eduardo D. Sontag,et al.  Backpropagation Can Give Rise to Spurious Local Minima Even for Networks without Hidden Layers , 1989, Complex Syst..

[4]  Sean R. Eddy,et al.  Profile hidden Markov models , 1998, Bioinform..

[5]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[6]  Renato D. C. Monteiro,et al.  A nonlinear programming algorithm for solving semidefinite programs via low-rank factorization , 2003, Math. Program..

[7]  Tommi S. Jaakkola,et al.  Weighted Low-Rank Approximations , 2003, ICML.

[8]  Yinyu Ye,et al.  Semidefinite programming for ad hoc wireless sensor network localization , 2004, Third International Symposium on Information Processing in Sensor Networks, 2004. IPSN 2004.

[9]  Tommi S. Jaakkola,et al.  Maximum-Margin Matrix Factorization , 2004, NIPS.

[10]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[11]  Dennis DeCoste,et al.  Collaborative prediction using ensembles of Maximum Margin Matrix Factorizations , 2006, ICML.

[12]  Kim-Chuan Toh,et al.  Semidefinite Programming Approaches for Sensor Network Localization With Noisy Distance Measurements , 2006, IEEE Transactions on Automation Science and Engineering.

[13]  Robert E. Mahony,et al.  Optimization Algorithms on Matrix Manifolds , 2007 .

[14]  Scott Aaronson,et al.  The learnability of quantum states , 2006, Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences.

[15]  E. Candès The restricted isometry property and its implications for compressed sensing , 2008 .

[16]  Emmanuel J. Candès,et al.  Exact Matrix Completion via Convex Optimization , 2009, Found. Comput. Math..

[17]  Pablo A. Parrilo,et al.  Guaranteed Minimum-Rank Solutions of Linear Matrix Equations via Nuclear Norm Minimization , 2007, SIAM Rev..

[18]  Stephen Becker,et al.  Quantum state tomography via compressed sensing. , 2009, Physical review letters.

[19]  Emmanuel J. Candès,et al.  The Power of Convex Relaxation: Near-Optimal Matrix Completion , 2009, IEEE Transactions on Information Theory.

[20]  Emmanuel J. Candès,et al.  Matrix Completion With Noise , 2009, Proceedings of the IEEE.

[21]  Martin J. Wainwright,et al.  Fast global convergence rates of gradient methods for high-dimensional statistical recovery , 2010, NIPS.

[22]  Emmanuel J. Candès,et al.  Tight Oracle Inequalities for Low-Rank Matrix Recovery From a Minimal Number of Noisy Random Measurements , 2011, IEEE Transactions on Information Theory.

[23]  Nicolas Gillis,et al.  Low-Rank Matrix Approximation with Weights or Missing Data Is NP-Hard , 2010, SIAM J. Matrix Anal. Appl..

[24]  Ewout van den Berg,et al.  1-Bit Matrix Completion , 2012, ArXiv.

[25]  S. Sastry,et al.  Compressive Phase Retrieval From Squared Output Measurements Via Semidefinite Programming , 2011, 1111.6323.

[26]  Martin J. Wainwright,et al.  Restricted strong convexity and weighted matrix completion: Optimal bounds with noise , 2010, J. Mach. Learn. Res..

[27]  Alexandre Bernardino,et al.  Unifying Nuclear Norm and Bilinear Factorization Approaches for Low-Rank Matrix Decomposition , 2013, 2013 IEEE International Conference on Computer Vision.

[28]  Yoram Bresler,et al.  Near Optimal Compressed Sensing of Sparse Rank-One Matrices via Sparse Power Factorization , 2013, ArXiv.

[29]  Anima Anandkumar,et al.  Guaranteed Non-Orthogonal Tensor Decomposition via Alternating Rank-1 Updates , 2014, ArXiv.

[30]  J. Wellner,et al.  Chernoff's density is log-concave. , 2012, Bernoulli : official journal of the Bernoulli Society for Mathematical Statistics and Probability.

[31]  J. Wellner,et al.  Log-Concavity and Strong Log-Concavity: a review. , 2014, Statistics surveys.

[32]  Anima Anandkumar,et al.  Analyzing Tensor Power Method Dynamics: Applications to Learning Overcomplete Latent Variable Models , 2014, ArXiv.

[33]  Prateek Jain,et al.  Learning Sparsely Used Overcomplete Dictionaries , 2014, COLT.

[34]  Surya Ganguli,et al.  Identifying and attacking the saddle point problem in high-dimensional non-convex optimization , 2014, NIPS.

[35]  Aditya Bhaskara,et al.  More Algorithms for Provable Dictionary Learning , 2014, ArXiv.

[36]  Zuowei Shen,et al.  L0 Norm Based Dictionary Learning by Proximal Methods with Global Convergence , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Marc Teboulle,et al.  Proximal alternating linearized minimization for nonconvex and nonsmooth problems , 2013, Mathematical Programming.

[38]  T. Zhao,et al.  Nonconvex Low Rank Matrix Factorization via Inexact First Order Oracle , 2015 .

[39]  René Vidal,et al.  Global Optimality in Tensor Factorization, Deep Learning, and Beyond , 2015, ArXiv.

[40]  Kiryung Lee,et al.  RIP-like Properties in Subsampled Blind Deconvolution , 2015, ArXiv.

[41]  Sanjeev Arora,et al.  Simple, Efficient, and Neural Algorithms for Sparse Coding , 2015, COLT.

[42]  Yuxin Chen,et al.  Solving Random Quadratic Systems of Equations Is Nearly as Easy as Solving Linear Systems , 2015, NIPS.

[43]  Zhi-Quan Luo,et al.  Guaranteed Matrix Completion via Non-Convex Factorization , 2014, IEEE Transactions on Information Theory.

[44]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[45]  Sujay Sanghavi,et al.  The Local Convexity of Solving Systems of Quadratic Equations , 2015, 1506.07868.

[46]  Zhaoran Wang,et al.  A Nonconvex Optimization Framework for Low Rank Matrix Estimation , 2015, NIPS.

[47]  Christopher De Sa,et al.  Global Convergence of Stochastic Gradient Descent for Some Non-convex Matrix Problems , 2014, ICML.

[48]  John Wright,et al.  When Are Nonconvex Problems Not Scary? , 2015, ArXiv.

[49]  Lixin Shen,et al.  Overcomplete tensor decomposition via convex optimization , 2015, 2015 IEEE 6th International Workshop on Computational Advances in Multi-Sensor Adaptive Processing (CAMSAP).

[50]  Yonina C. Eldar,et al.  Phase Retrieval via Matrix Completion , 2011, SIAM Rev..

[51]  Martin J. Wainwright,et al.  Fast low-rank estimation by projected gradient descent: General statistical and algorithmic guarantees , 2015, ArXiv.

[52]  Anastasios Kyrillidis,et al.  Dropping Convexity for Faster Semi-definite Optimization , 2015, COLT.

[53]  Justin K. Romberg,et al.  An Overview of Low-Rank Matrix Recovery From Incomplete Observations , 2016, IEEE Journal of Selected Topics in Signal Processing.

[54]  Nicolas Boumal,et al.  The non-convex Burer-Monteiro approach works on smooth semidefinite programs , 2016, NIPS.

[55]  Ju Sun,et al.  When Are Nonconvex Optimization Problems Not Scary? , 2016 .

[56]  John Wright,et al.  A Geometric Analysis of Phase Retrieval , 2016, International Symposium on Information Theory.

[57]  Siam Rfview,et al.  CONVERGENCE CONDITIONS FOR ASCENT METHODS , 2016 .

[58]  John D. Lafferty,et al.  Convergence Analysis for Rectangular Matrix Completion Using Burer-Monteiro Factorization and Gradient Descent , 2016, ArXiv.

[59]  Tselil Schramm,et al.  Fast spectral algorithms from sum-of-squares proofs: tensor decomposition and planted sparse vectors , 2015, STOC.

[60]  Nathan Srebro,et al.  Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.

[61]  John Wright,et al.  Finding a Sparse Vector in a Subspace: Linear Sparsity Using Alternating Directions , 2014, IEEE Transactions on Information Theory.

[62]  Quoc Tran-Dinh,et al.  Extended Gauss-Newton and Gauss-Newton-ADMM Algorithms for Low-Rank Matrix Optimization , 2016, 1606.03358.

[63]  Sham M. Kakade,et al.  Provable Efficient Online Matrix Completion via Non-convex Stochastic Gradient Descent , 2016, NIPS.

[64]  Stephen P. Boyd,et al.  Generalized Low Rank Models , 2014, Found. Trends Mach. Learn..

[65]  Anastasios Kyrillidis,et al.  Provable Burer-Monteiro factorization for a class of norm-constrained matrix problems , 2016 .

[66]  Michael I. Jordan,et al.  Gradient Descent Converges to Minimizers , 2016, ArXiv.

[67]  Michael I. Jordan,et al.  Gradient Descent Only Converges to Minimizers , 2016, COLT.

[68]  Tengyu Ma,et al.  Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[69]  Max Simchowitz,et al.  Low-rank Solutions of Linear Matrix Equations via Procrustes Flow , 2015, ICML.

[70]  Junwei Lu,et al.  Symmetry, Saddle Points, and Global Geometry of Nonconvex Matrix Factorization , 2016, ArXiv.

[71]  Michael I. Jordan,et al.  Gradient Descent Can Take Exponential Time to Escape Saddle Points , 2017, NIPS.

[72]  Yi Zheng,et al.  No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.

[73]  Xiao Zhang,et al.  A Unified Computational and Statistical Framework for Nonconvex Low-rank Matrix Estimation , 2016, AISTATS.

[74]  Gongguo Tang,et al.  Convex and nonconvex geometries of symmetric tensor factorization , 2017, 2017 51st Asilomar Conference on Signals, Systems, and Computers.

[75]  Prateek Jain,et al.  Global Convergence of Non-Convex Gradient Descent for Computing Matrix Squareroot , 2015, AISTATS.

[76]  Federica Sciacchitano,et al.  Image reconstruction under non-Gaussian noise , 2017 .

[77]  Yue Sun,et al.  Low-Rank Positive Semidefinite Matrix Recovery From Corrupted Rank-One Measurements , 2016, IEEE Transactions on Signal Processing.

[78]  Anastasios Kyrillidis,et al.  Provable quantum state tomography via non-convex methods , 2017, ArXiv.

[79]  John Wright,et al.  Complete Dictionary Recovery Over the Sphere II: Recovery by Riemannian Trust-Region Method , 2015, IEEE Transactions on Information Theory.

[80]  Anastasios Kyrillidis,et al.  Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach , 2016, AISTATS.

[81]  Michael I. Jordan,et al.  How to Escape Saddle Points Efficiently , 2017, ICML.

[82]  Anastasios Kyrillidis,et al.  Finding Low-rank Solutions to Matrix Problems, Efficiently and Provably , 2016, SIAM J. Imaging Sci..

[83]  Yoram Bresler,et al.  Near-Optimal Compressed Sensing of a Class of Sparse Low-Rank Matrices Via Sparse Power Factorization , 2013, IEEE Transactions on Information Theory.

[84]  Junwei Lu,et al.  Symmetry. Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization , 2016, 2018 Information Theory and Applications Workshop (ITA).

[85]  Zhihui Zhu,et al.  Global Optimality in Low-Rank Matrix Optimization , 2017, IEEE Transactions on Signal Processing.