No Spurious Solutions in Non-convex Matrix Sensing: Structure Compensates for Isometry

The paper is concerned with the theoretical explanation of the recent empirical success of solving the low-rank matrix sensing problem via nonconvex optimization. It is known that under an incoherence assumption (namely, RIP) on the sensing operator, the optimization problem has no spurious local minima. This assumption is too strong for real-world applications where the amount of data cannot be sufficiently high. We develop the notion of Kernel Structure Property (KSP), which can be used alone or combined with RIP in this context. KSP explains how the inherent structure of an operator contributes to the non-existence of spurious local minima. As a special case, we study sparse sensing operators that have a low-dimensional representation. Using KSP, we obtain novel necessary and sufficient conditions for no spurious solutions in matrix sensing and demonstrate their usefulness in analytical and numerical studies.

[1]  J. Lavaei,et al.  Role of sparsity and structure in the optimization landscape of non-convex matrix sensing , 2020, Mathematical Programming.

[2]  P. Absil,et al.  Erratum to: ``Global rates of convergence for nonconvex optimization on manifolds'' , 2016, IMA Journal of Numerical Analysis.

[3]  Dawei Li,et al.  Over-Parameterized Deep Neural Networks Have No Strict Local Minima For Any Continuous Activations , 2018, ArXiv.

[4]  A. Montanari,et al.  The landscape of empirical risk for nonconvex losses , 2016, The Annals of Statistics.

[5]  Kim-Chuan Toh,et al.  SDPT3 -- A Matlab Software Package for Semidefinite Programming , 1996 .

[6]  Michael I. Jordan,et al.  Gradient Descent Only Converges to Minimizers , 2016, COLT.

[7]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[8]  Peter I. Frazier,et al.  A Tutorial on Bayesian Optimization , 2018, ArXiv.

[9]  Yurii Nesterov,et al.  Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[10]  Panos M. Pardalos,et al.  Quadratic programming with one negative eigenvalue is NP-hard , 1991, J. Glob. Optim..

[11]  John Wright,et al.  Complete Dictionary Recovery Using Nonconvex Optimization , 2015, ICML.

[12]  John D. Lafferty,et al.  A Convergent Gradient Descent Algorithm for Rank Minimization and Semidefinite Programming from Random Linear Measurements , 2015, NIPS.

[13]  Michael I. Jordan,et al.  How to Escape Saddle Points Efficiently , 2017, ICML.

[14]  Yi Zheng,et al.  No Spurious Local Minima in Nonconvex Low Rank Problems: A Unified Geometric Analysis , 2017, ICML.

[15]  Andrea Montanari,et al.  Matrix completion from a few entries , 2009, 2009 IEEE International Symposium on Information Theory.

[16]  Anastasios Kyrillidis,et al.  Non-square matrix sensing without spurious local minima via the Burer-Monteiro approach , 2016, AISTATS.

[17]  Zhaoran Wang,et al.  A Nonconvex Optimization Framework for Low Rank Matrix Estimation , 2015, NIPS.

[18]  Javad Lavaei,et al.  Sharp Restricted Isometry Bounds for the Inexistence of Spurious Local Minima in Nonconvex Matrix Recovery , 2019, J. Mach. Learn. Res..

[19]  Yuxin Chen,et al.  Gradient descent with random initialization: fast global convergence for nonconvex phase retrieval , 2018, Mathematical Programming.

[20]  Meisam Razaviyayn,et al.  Learning Deep Models: Critical Points and Local Openness , 2018, ICLR.

[21]  Yuling Yan,et al.  Noisy Matrix Completion: Understanding Statistical Guarantees for Convex Relaxation via Nonconvex Optimization , 2019, SIAM J. Optim..

[22]  Junwei Lu,et al.  Symmetry. Saddle Points, and Global Optimization Landscape of Nonconvex Matrix Factorization , 2016, 2018 Information Theory and Applications Workshop (ITA).

[23]  Zhihui Zhu,et al.  Alternating Minimizations Converge to Second-Order Optimal Solutions , 2019, ICML.

[24]  Suvrit Sra,et al.  Global optimality conditions for deep neural networks , 2017, ICLR.

[25]  Prateek Jain,et al.  Learning Sparsely Used Overcomplete Dictionaries via Alternating Minimization , 2013, SIAM J. Optim..

[26]  Michael I. Jordan,et al.  Gradient Descent Can Take Exponential Time to Escape Saddle Points , 2017, NIPS.

[27]  Javad Lavaei,et al.  How Much Restricted Isometry is Needed In Nonconvex Matrix Recovery? , 2018, NeurIPS.

[28]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[29]  Stephen P. Boyd,et al.  Graph Implementations for Nonsmooth Convex Programs , 2008, Recent Advances in Learning and Control.

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  Thomas Hofmann,et al.  Escaping Saddles with Stochastic Gradients , 2018, ICML.

[32]  Zhihui Zhu,et al.  Global Optimality in Low-Rank Matrix Optimization , 2017, IEEE Transactions on Signal Processing.

[33]  Xiaodong Li,et al.  Phase Retrieval via Wirtinger Flow: Theory and Algorithms , 2014, IEEE Transactions on Information Theory.

[34]  Yuxin Chen,et al.  Nonconvex Optimization Meets Low-Rank Matrix Factorization: An Overview , 2018, IEEE Transactions on Signal Processing.

[35]  Prateek Jain,et al.  Low-rank matrix completion using alternating minimization , 2012, STOC '13.

[36]  Yonina C. Eldar,et al.  Low-Rank Phase Retrieval , 2016, IEEE Transactions on Signal Processing.

[37]  Aryan Mokhtari,et al.  A Newton-Based Method for Nonconvex Optimization with Fast Evasion of Saddle Points , 2017, SIAM J. Optim..

[38]  R D Zimmerman,et al.  MATPOWER: Steady-State Operations, Planning, and Analysis Tools for Power Systems Research and Education , 2011, IEEE Transactions on Power Systems.

[39]  Mahdi Soltanolkotabi,et al.  Learning ReLUs via Gradient Descent , 2017, NIPS.

[40]  Yu Zhang,et al.  Conic Relaxations for Power System State Estimation With Line Measurements , 2017, IEEE Transactions on Control of Network Systems.

[41]  Nicholas I. M. Gould,et al.  Complexity bounds for second-order optimality in unconstrained optimization , 2012, J. Complex..

[42]  Zhi-Quan Luo,et al.  Guaranteed Matrix Completion via Non-Convex Factorization , 2014, IEEE Transactions on Information Theory.

[43]  Andrea Montanari,et al.  Matrix Completion from Noisy Entries , 2009, J. Mach. Learn. Res..

[44]  Tengyu Ma,et al.  Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[45]  Nathan Srebro,et al.  Global Optimality of Local Search for Low Rank Matrix Recovery , 2016, NIPS.

[46]  Kim-Chuan Toh,et al.  Solving semidefinite-quadratic-linear programs using SDPT3 , 2003, Math. Program..

[47]  John Wright,et al.  A Geometric Analysis of Phase Retrieval , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[48]  Javad Lavaei,et al.  A theory on the absence of spurious solutions for nonconvex and nonsmooth optimization , 2018, NeurIPS.