Ghost Penalties in Nonconvex Constrained Optimization: Diminishing Stepsizes and Iteration Complexity

We consider, for the first time, general diminishing stepsize methods for nonconvex, constrained optimization problems. We show that by using directions obtained in an SQP-like fashion convergence to generalized stationary points can be proved. In order to do so, we make use of classical penalty functions in an uncon- ventional way. In particular, penalty functions only enter in the theoretical analysis of convergence while the algorithm itself is penalty-free. We then consider the iteration complexity of this method and some variants where the stepsize is either kept constant or decreased according to very simple rules. We establish convergence to $\delta$-approximate stationary points in at most $O(\delta^{-2})$, $O(\delta^{-3})$, or $O(\delta^{-4})$ iterations according to the assumptions made on the problem. These complexity results complement nicely the very few existing results in the field.

[1]  G. A. Garreau,et al.  Mathematical Programming and Control Theory , 1979, Mathematical Gazette.

[2]  R. Rockafellar Lagrange multipliers and subderivatives of optimal value functions in nonlinear programming , 1982 .

[3]  R. Rockafellar,et al.  Lipschitzian properties of multifunctions , 1985 .

[4]  Naum Zuselevich Shor,et al.  Minimization Methods for Non-Differentiable Functions , 1985, Springer Series in Computational Mathematics.

[5]  L. Grippo,et al.  An exact penalty function method with global convergence properties for nonlinear programming problems , 1986, Math. Program..

[6]  Luigi Grippo,et al.  On the exactness of a class of nondifferentiable penalty functions , 1988 .

[7]  L. Grippo,et al.  Exact penalty functions in constrained optimization , 1989 .

[8]  James V. Burke,et al.  A robust sequential quadratic programming method , 1989, Math. Program..

[9]  John N. Tsitsiklis,et al.  Parallel and distributed computation , 1989 .

[10]  J. Burke A sequential quadratic programming method for potentially infeasible mathematical programs , 1989 .

[11]  Stefano Lucidi,et al.  New Results on a Continuously Differentiable Exact Penalty Function , 1992, SIAM J. Optim..

[12]  James V. Burke,et al.  A Robust Trust Region Method for Constrained Nonlinear Programming Problems , 1992, SIAM J. Optim..

[13]  Stephen A. Vavasis,et al.  Black-Box Complexity of Local Minimization , 1993, SIAM J. Optim..

[14]  Nguyen Dong Yen,et al.  Holder continuity of solutions to a parametric variational inequality , 1995 .

[15]  Ya-Xiang Yuan,et al.  On the convergence of a new trust region algorithm , 1995 .

[16]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[17]  Francisco Facchinei,et al.  Robust Recursive Quadratic Programming Algorithm Model with Global and Superlinear Convergence Properties , 1997 .

[18]  Mahmoud El-Alem A Global Convergence Theory for Dennis, El-Alem, and Maciel's Class of Trust-Region Algorithms for Constrained Optimization without Assuming Regularity , 1999, SIAM J. Optim..

[19]  Ya-Xiang Yuan,et al.  A Robust Algorithm for Optimization with General Equality and Inequality Constraints , 2000, SIAM J. Sci. Comput..

[20]  O. SIAMJ.,et al.  A CLASS OF GLOBALLY CONVERGENT OPTIMIZATION METHODS BASED ON CONSERVATIVE CONVEX SEPARABLE APPROXIMATIONS∗ , 2002 .

[21]  F. Facchinei,et al.  Finite-Dimensional Variational Inequalities and Complementarity Problems , 2003 .

[22]  Mikhail V. Solodov,et al.  On the Sequential Quadratically Constrained Quadratic Programming Methods , 2004, Math. Oper. Res..

[23]  D. Hunter,et al.  A Tutorial on MM Algorithms , 2004 .

[24]  Jie Sun,et al.  A Robust Primal-Dual Interior-Point Algorithm for Nonlinear Programs , 2004, SIAM J. Optim..

[25]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[26]  Yurii Nesterov,et al.  Cubic regularization of Newton method and its global performance , 2006, Math. Program..

[27]  Wei Yu,et al.  Joint optimization of relay strategies and resource allocations in cooperative cellular networks , 2006, IEEE Journal on Selected Areas in Communications.

[28]  Mikhail V. Solodov,et al.  Global convergence of an SQP method without boundedness assumptions on any of the iterative sequences , 2009 .

[29]  Alexander Shapiro,et al.  Stochastic Approximation approach to Stochastic Programming , 2013 .

[30]  Marc Teboulle,et al.  A Moving Balls Approximation Method for a Class of Smooth Constrained Minimization Problems , 2010, SIAM J. Optim..

[31]  Asuman E. Ozdaglar,et al.  Constrained Consensus and Optimization in Multi-Agent Networks , 2008, IEEE Transactions on Automatic Control.

[32]  Amir Beck,et al.  A sequential parametric convex approximation method with applications to nonconvex truss topology design problems , 2010, J. Glob. Optim..

[33]  Ya-Xiang Yuan,et al.  A Sequential Quadratic Programming Method Without A Penalty Function or a Filter for Nonlinear Equality Constrained Optimization , 2011, SIAM J. Optim..

[34]  Nicholas I. M. Gould,et al.  On the Evaluation Complexity of Composite Function Minimization with Applications to Nonconvex Nonlinear Programming , 2011, SIAM J. Optim..

[35]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[36]  James V. Burke,et al.  Epi-convergent Smoothing with Applications to Convex Composite Functions , 2012, SIAM J. Optim..

[37]  Julien Mairal,et al.  Optimization with First-Order Surrogate Functions , 2013, ICML.

[38]  Nicholas I. M. Gould,et al.  On the Evaluation Complexity of Cubic Regularization Methods for Potentially Rank-Deficient Nonlinear Least-Squares Problems and Its Relevance to Constrained Nonlinear Optimization , 2013, SIAM J. Optim..

[39]  Alfred Auslender,et al.  An Extended Sequential Quadratically Constrained Quadratic Programming Algorithm for Nonlinear, Semidefinite, and Second-Order Cone Programming , 2013, J. Optim. Theory Appl..

[40]  Zhi-Quan Luo,et al.  A Unified Convergence Analysis of Block Successive Minimization Methods for Nonsmooth Optimization , 2012, SIAM J. Optim..

[41]  Maya R. Gupta,et al.  Training highly multiclass classifiers , 2014, J. Mach. Learn. Res..

[42]  Nicholas I. M. Gould,et al.  On the complexity of finding first-order critical points in constrained nonlinear optimization , 2014, Math. Program..

[43]  Francisco Facchinei,et al.  Decomposition by Partial Linearization: Parallel Optimization of Multi-Agent Systems , 2013, IEEE Transactions on Signal Processing.

[44]  Francisco Facchinei,et al.  Parallel Selective Algorithms for Nonconvex Big Data Optimization , 2014, IEEE Transactions on Signal Processing.

[45]  Francisco Facchinei,et al.  Hybrid Random/Deterministic Parallel Algorithms for Convex and Nonconvex Big Data Optimization , 2014, IEEE Transactions on Signal Processing.

[46]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[47]  Dimitri P. Bertsekas,et al.  Incremental Gradient, Subgradient, and Proximal Methods for Convex Optimization: A Survey , 2015, ArXiv.

[48]  Gesualdo Scutari,et al.  NEXT: In-Network Nonconvex Optimization , 2016, IEEE Transactions on Signal and Information Processing over Networks.

[49]  Franziska Wulf,et al.  Minimization Methods For Non Differentiable Functions , 2016 .

[50]  Stephen P. Boyd,et al.  Variations and extension of the convex–concave procedure , 2016 .

[51]  Edouard Pauwels,et al.  Majorization-Minimization Procedures and Convergence of SQP Methods for Semi-Algebraic and Tame Programs , 2014, Math. Oper. Res..

[52]  José Mario Martínez,et al.  Evaluation Complexity for Nonlinear Constrained Optimization Using Unscaled KKT Conditions and High-Order Models , 2016, SIAM J. Optim..

[53]  Zhi-Quan Luo,et al.  A Unified Algorithmic Framework for Block-Structured Optimization Involving Big Data: With applications in machine learning and signal processing , 2015, IEEE Signal Processing Magazine.

[54]  Prabhu Babu,et al.  Majorization-Minimization Algorithms in Signal Processing, Communications, and Machine Learning , 2017, IEEE Transactions on Signal Processing.

[55]  Francisco Facchinei,et al.  Parallel and Distributed Methods for Constrained Nonconvex Optimization-Part II: Applications in Communications and Machine Learning , 2017, IEEE Transactions on Signal Processing.

[56]  Francisco Facchinei,et al.  Feasible methods for nonconvex nonsmooth problems with applications in green communications , 2017, Math. Program..

[57]  José Mario Martínez,et al.  On High-order Model Regularization for Constrained Optimization , 2017, SIAM J. Optim..

[58]  Behrouz Touri,et al.  Non-Convex Distributed Optimization , 2015, IEEE Transactions on Automatic Control.

[59]  Shiqian Ma,et al.  Stochastic Quasi-Newton Methods for Nonconvex Stochastic Optimization , 2014, SIAM J. Optim..

[60]  Francisco Facchinei,et al.  Parallel and Distributed Methods for Constrained Nonconvex Optimization—Part I: Theory , 2016, IEEE Transactions on Signal Processing.

[61]  Nicholas I. M. Gould,et al.  Corrigendum: On the complexity of finding first-order critical points in constrained nonlinear optimization , 2017, Math. Program..

[62]  Jorge Nocedal,et al.  Optimization Methods for Large-Scale Machine Learning , 2016, SIAM Rev..

[63]  Wotao Yin,et al.  On Nonconvex Decentralized Gradient Descent , 2016, IEEE Transactions on Signal Processing.

[64]  Dmitriy Drusvyatskiy,et al.  Stochastic model-based minimization of weakly convex functions , 2018, SIAM J. Optim..

[65]  Nicholas I. M. Gould,et al.  Optimality of orders one to three and beyond: characterization and evaluation complexity in constrained nonconvex optimization , 2017, J. Complex..

[66]  P. Toint,et al.  Evaluation Complexity Bounds for Smooth Constrained Nonlinear Optimization Using Scaled KKT Conditions and High-Order Models , 2019, Approximation and Optimization.

[67]  Brian M. Sadler,et al.  Decentralized Dictionary Learning Over Time-Varying Digraphs , 2018, J. Mach. Learn. Res..

[68]  Dmitriy Drusvyatskiy,et al.  Stochastic Subgradient Method Converges on Tame Functions , 2018, Foundations of Computational Mathematics.