SNAP: Finding Approximate Second-Order Stationary Solutions Efficiently for Non-convex Linearly Constrained Problems

This paper proposes low-complexity algorithms for finding approximate second-order stationary points (SOSPs) of problems with smooth non-convex objective and linear constraints. While finding (approximate) SOSPs is computationally intractable, we first show that generic instances of the problem can be solved efficiently. More specifically, for a generic problem instance, certain strict complementarity (SC) condition holds for all Karush-Kuhn-Tucker (KKT) solutions (with probability one). The SC condition is then used to establish an equivalence relationship between two different notions of SOSPs, one of which is computationally easy to verify. Based on this particular notion of SOSP, we design an algorithm named the Successive Negative-curvature grAdient Projection (SNAP), which successively performs either conventional gradient projection or some negative curvature based projection steps to find SOSPs. SNAP and its first-order extension SNAP$^+$, require $\mathcal{O}(1/\epsilon^{2.5})$ iterations to compute an $(\epsilon, \sqrt{\epsilon})$-SOSP, and their per-iteration computational complexities are polynomial in the number of constraints and problem dimension. To our knowledge, this is the first time that first-order algorithms with polynomial per-iteration complexity and global sublinear rate have been designed to find SOSPs of the important class of non-convex problems with linear constraints.

[1]  Aryan Mokhtari,et al.  Escaping Saddle Points in Constrained Optimization , 2018, NeurIPS.

[2]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[3]  Songtao Lu,et al.  PA-GD: On the Convergence of Perturbed Alternating Gradient Descent to Second-Order Stationary Points for Structured Nonconvex Optimization , 2019, ICML.

[4]  P. Toint,et al.  Global convergence of a class of trust region algorithms for optimization with simple bounds , 1988 .

[5]  Jonathan J. Hull,et al.  A Database for Handwritten Text Recognition Research , 1994, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Stephen J. Wright,et al.  Complexity Analysis of Second-Order Line-Search Algorithms for Smooth Nonconvex Optimization , 2017, SIAM J. Optim..

[7]  Francisco Facchinei,et al.  Convergence to Second Order Stationary Points in Inequality Constrained Optimization , 1998, Math. Oper. Res..

[8]  Matus Telgarsky,et al.  Non-convex learning via Stochastic Gradient Langevin Dynamics: a nonasymptotic analysis , 2017, COLT.

[9]  Yair Carmon,et al.  Accelerated Methods for NonConvex Optimization , 2018, SIAM J. Optim..

[10]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[11]  Chih-Jen Lin,et al.  Newton's Method for Large Bound-Constrained Optimization Problems , 1999, SIAM J. Optim..

[12]  John Wright,et al.  Using negative curvature in solving nonlinear programs , 2017, Comput. Optim. Appl..

[13]  M. Lescrenier Convergence of trust region algorithms for optimization with bounds when strict complementarity does not hold , 1991 .

[14]  Jacek M. Zurada,et al.  Learning Understandable Neural Networks With Nonnegative Weight Constraints , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[15]  Nicholas I. M. Gould,et al.  Second-Order Optimality and Beyond: Characterization and Evaluation Complexity in Convexly Constrained Nonlinear Optimization , 2018, Found. Comput. Math..

[16]  Meisam Razaviyayn,et al.  A Trust Region Method for Finding Second-Order Stationarity in Linearly Constrained Nonconvex Optimization , 2019, SIAM J. Optim..

[17]  José Mario Martínez,et al.  Second-order negative-curvature methods for box-constrained and general constrained optimization , 2010, Comput. Optim. Appl..

[18]  Nikos D. Sidiropoulos,et al.  Tensor Decomposition for Signal Processing and Machine Learning , 2016, IEEE Transactions on Signal Processing.

[19]  Adel Javanmard,et al.  Theoretical Insights Into the Optimization Landscape of Over-Parameterized Shallow Neural Networks , 2017, IEEE Transactions on Information Theory.

[20]  Katta G. Murty,et al.  Some NP-complete problems in quadratic and nonlinear programming , 1987, Math. Program..

[21]  Adam N. Letchford,et al.  On Nonconvex Quadratic Programming with Box Constraints , 2009, SIAM J. Optim..

[22]  J. Lee,et al.  Convergence to Second-Order Stationarity for Constrained Non-Convex Optimization , 2018, 1810.02024.

[23]  Nicholas I. M. Gould,et al.  Trust Region Methods , 2000, MOS-SIAM Series on Optimization.

[24]  Tsung-Hui Chang,et al.  Clustering by Orthogonal Non-negative Matrix Factorization: A Sequential Non-convex Penalty Approach , 2019, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Stephen J. Wright,et al.  A Newton-CG algorithm with complexity guarantees for smooth unconstrained optimization , 2018, Mathematical Programming.

[26]  D. Bertsekas,et al.  TWO-METRIC PROJECTION METHODS FOR CONSTRAINED OPTIMIZATION* , 1984 .

[27]  Zhi-Quan Luo,et al.  A Proximal Alternating Direction Method of Multiplier for Linearly Constrained Nonconvex Minimization , 2018, SIAM J. Optim..

[28]  Michael I. Jordan,et al.  Gradient Descent Only Converges to Minimizers , 2016, COLT.

[29]  Laura Palagi,et al.  Convergence to Second-Order Stationary Points of a Primal-Dual Algorithm Model for Nonlinear Programming , 2005, Math. Oper. Res..

[30]  John Wright,et al.  A Geometric Analysis of Phase Retrieval , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[31]  Aryan Mokhtari,et al.  A Newton-Based Method for Nonconvex Optimization with Fast Evasion of Saddle Points , 2017, SIAM J. Optim..

[32]  Kenji Kawaguchi,et al.  Deep Learning without Poor Local Minima , 2016, NIPS.