Run-and-Inspect Method for nonconvex optimization and global optimality bounds for R-local minimizers

Many optimization algorithms converge to stationary points. When the underlying problem is nonconvex, they may get trapped at local minimizers or stagnate near saddle points. We propose the Run-and-Inspect Method, which adds an “inspect” phase to existing algorithms that helps escape from non-global stationary points. It samples a set of points in a radius R around the current point. When a sample point yields a sufficient decrease in the objective, we resume an existing algorithm from that point. If no sufficient decrease is found, the current point is called an approximate R-local minimizer. We show that an R-local minimizer is globally optimal, up to a specific error depending on R, if the objective function can be implicitly decomposed into a smooth convex function plus a restricted function that is possibly nonconvex, nonsmooth. Therefore, for such nonconvex objective functions, verifying global optimality is fundamentally easier. For high-dimensional problems, we introduce blockwise inspections to overcome the curse of dimensionality while still maintaining optimality bounds up to a factor equal to the number of blocks. We also present the sample complexities of these methods. When we apply our method to the existing algorithms on a set of artificial and realistic nonconvex problems, we find significantly improved chances of obtaining global minima.

[1]  José Mario Martínez,et al.  Cubic-regularization counterpart of a variable-norm trust-region method for unconstrained minimization , 2017, J. Glob. Optim..

[2]  Zhanxing Zhu,et al.  Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes , 2017, ArXiv.

[3]  Cun-Hui Zhang Nearly unbiased variable selection under minimax concave penalty , 2010, 1002.4734.

[4]  Wotao Yin,et al.  Global Convergence of ADMM in Nonconvex Nonsmooth Optimization , 2015, Journal of Scientific Computing.

[5]  Tengyu Ma,et al.  Matrix Completion has No Spurious Local Minimum , 2016, NIPS.

[6]  Alexander J. Smola,et al.  Stochastic Variance Reduction for Nonconvex Optimization , 2016, ICML.

[7]  Surya Ganguli,et al.  On the saddle point problem for non-convex optimization , 2014, ArXiv.

[8]  John Wright,et al.  A Geometric Analysis of Phase Retrieval , 2016, 2016 IEEE International Symposium on Information Theory (ISIT).

[9]  Furong Huang,et al.  Escaping From Saddle Points - Online Stochastic Gradient for Tensor Decomposition , 2015, COLT.

[10]  Wotao Yin,et al.  A Block Coordinate Descent Method for Regularized Multiconvex Optimization with Applications to Nonnegative Tensor Factorization and Completion , 2013, SIAM J. Imaging Sci..

[11]  Yvonne Freeh,et al.  An R and S–PLUS Companion to Applied Regression , 2004 .

[12]  Stanley Osher,et al.  Stochastic Backward Euler: An Implicit Gradient Descent Algorithm for k-Means Clustering , 2017, J. Sci. Comput..

[13]  Georgios Piliouras,et al.  Gradient Descent Converges to Minimizers: The Case of Non-Isolated Critical Points , 2016, ArXiv.

[14]  Stefano Soatto,et al.  Deep relaxation: partial differential equations for optimizing deep neural networks , 2017, Research in the Mathematical Sciences.

[15]  Michael I. Jordan,et al.  How to Escape Saddle Points Efficiently , 2017, ICML.

[16]  Mark W. Schmidt,et al.  Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.

[17]  Saeed Ghadimi,et al.  Accelerated gradient methods for nonconvex nonlinear and stochastic programming , 2013, Mathematical Programming.

[18]  D. Gleich TRUST REGION METHODS , 2017 .

[19]  C. D. Gelatt,et al.  Optimization by Simulated Annealing , 1983, Science.

[20]  Yann LeCun,et al.  Eigenvalues of the Hessian in Deep Learning: Singularity and Beyond , 2016, 1611.07476.

[21]  Yuantao Gu,et al.  Nonconvex Sparse Logistic Regression With Weakly Convex Regularization , 2017, IEEE Transactions on Signal Processing.

[22]  Ming Yan,et al.  Coordinate Friendly Structures, Algorithms and Applications , 2016, ArXiv.

[23]  John Wright,et al.  Complete dictionary recovery over the sphere , 2015, 2015 International Conference on Sampling Theory and Applications (SampTA).

[24]  Jinshan Zeng,et al.  GAITA: A Gauss-Seidel iterative thresholding algorithm for ℓq regularized least squares regression , 2017, J. Comput. Appl. Math..

[25]  Stefano Soatto,et al.  Entropy-SGD: biasing gradient descent into wide valleys , 2016, ICLR.

[26]  Yann LeCun,et al.  Singularity of the Hessian in Deep Learning , 2016, ArXiv.

[27]  John L. Nazareth,et al.  Introduction to derivative-free optimization , 2010, Math. Comput..

[28]  Zongben Xu,et al.  $L_{1/2}$ Regularization: A Thresholding Representation Theory and a Fast Solver , 2012, IEEE Transactions on Neural Networks and Learning Systems.

[29]  U. Ligges Review of An R and S-PLUS companion to applied regression by J. Fox, Sage Publications, Thousand Oaks, California 2002 , 2003 .

[30]  Yurii Nesterov,et al.  Cubic regularization of Newton method and its global performance , 2006, Math. Program..