On Constraints in First-Order Optimization: A View from Non-Smooth Dynamical Systems

We introduce a class of first-order methods for smooth constrained optimization that are based on an analogy to non-smooth dynamical systems. Two distinctive features of our approach are that (i) projections or optimizations over the entire feasible set are avoided, in stark contrast to projected gradient methods or the Frank-Wolfe method, and (ii) iterates are allowed to become infeasible, which differs from active set or feasible direction methods, where the descent motion stops as soon as a new constraint is encountered. The resulting algorithmic procedure is simple to implement even when constraints are nonlinear, and is suitable for large-scale constrained optimization problems in which the feasible set fails to have a simple structure. The key underlying idea is that constraints are expressed in terms of velocities instead of positions, which has the algorithmic consequence that optimizations over feasible sets at each iteration are replaced with optimizations over local, sparse convex approximations. The result is a simplified suite of algorithms and an expanded range of possible applications in machine learning.

[1]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[2]  Florian Dörfler,et al.  Non-Convex Feedback Optimization With Input and Output Constraints , 2021, IEEE Control Systems Letters.

[3]  Igor Griva,et al.  Fast projected gradient method for support vector machines , 2016 .

[4]  Kenneth L. Clarkson,et al.  Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm , 2008, SODA '08.

[5]  J. M. Martínez,et al.  Inexact spectral projected gradient methods on convex sets , 2003 .

[6]  Vincent Acary,et al.  On the equivalence between complementarity systems, projected systems and differential inclusions , 2006, Syst. Control. Lett..

[7]  Michael C. Ferris,et al.  Interior-Point Methods for Massive Support Vector Machines , 2002, SIAM J. Optim..

[8]  Zhang Liu,et al.  Interior-point methods for large-scale cone programming , 2011 .

[9]  Richard W. Cottle,et al.  Linear Complementarity Problem , 2009, Encyclopedia of Optimization.

[10]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[11]  BoydStephen,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007 .

[12]  Michael A. Saunders,et al.  SNOPT: An SQP Algorithm for Large-Scale Constrained Optimization , 2002, SIAM J. Optim..

[13]  G. Capobianco,et al.  Time finite element based Moreau‐type integrators , 2018 .

[14]  Manfred Morari,et al.  Efficient interior point methods for multistage problems arising in receding horizon control , 2012, 2012 IEEE 51st IEEE Conference on Decision and Control (CDC).

[15]  Changyu Wang,et al.  Convergence properties of inexact projected gradient methods , 2006 .

[16]  Alexandre M. Bayen,et al.  Accelerated Mirror Descent in Continuous and Discrete Time , 2015, NIPS.

[17]  C. Glocker Set-valued force laws , 2001 .

[18]  J. Troutman Variational Principles in Mechanics , 1983 .

[19]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[20]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[21]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[22]  Marc Teboulle,et al.  Gradient-based algorithms with applications to signal-recovery problems , 2010, Convex Optimization in Signal Processing and Communications.

[23]  Andre Wibisono,et al.  A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[24]  H. May,et al.  F. H. Clarke's Generalized Gradient and Fourier's Principle , 1985 .

[25]  Michael I. Jordan,et al.  Generalized Momentum-Based Methods: A Hamiltonian Perspective , 2019, SIAM J. Optim..

[26]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[27]  J. Moreau,et al.  Unilateral Contact and Dry Friction in Finite Freedom Dynamics , 1988 .

[28]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[29]  Michael I. Jordan,et al.  Continuous-time Lower Bounds for Gradient-based Algorithms , 2020, ICML.

[30]  Michael A. Saunders,et al.  SNOPT: An SQP Algorithm for Large-Scale Constrained Optimization , 2005, SIAM Rev..

[31]  Cyrille W. Combettes,et al.  Boosting Frank-Wolfe by Chasing Gradients , 2020, ICML.

[32]  P. Olver Nonlinear Systems , 2013 .

[33]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[34]  Vincent Acary,et al.  Higher order event capturing time-stepping schemes for nonsmooth multibody systems with unilateral constraints and impacts , 2012 .

[35]  Christian Studer,et al.  Numerics of Unilateral Contacts and Friction , 2009 .

[36]  Elad Hazan,et al.  Faster Rates for the Frank-Wolfe Method over Strongly-Convex Sets , 2014, ICML.

[37]  Elad Hazan,et al.  Projection-free Online Learning , 2012, ICML.

[38]  Manfred Morari,et al.  A Projected Gradient and Constraint Linearization Method for Nonlinear Model Predictive Control , 2016, SIAM J. Control. Optim..

[39]  Michael I. Jordan,et al.  A Dynamical Systems Perspective on Nesterov Acceleration , 2019, ICML.

[40]  Aleksej F. Filippov,et al.  Differential Equations with Discontinuous Righthand Sides , 1988, Mathematics and Its Applications.

[41]  Michael I. Jordan,et al.  Optimization with Momentum: Dynamical, Control-Theoretic, and Symplectic Perspectives , 2020, ArXiv.

[42]  Amin Karbasi,et al.  One Sample Stochastic Frank-Wolfe , 2019, AISTATS.

[43]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[44]  Bernhard Schölkopf,et al.  New Support Vector Algorithms , 2000, Neural Computation.

[45]  Heinz H. Bauschke,et al.  Convex Analysis and Monotone Operator Theory in Hilbert Spaces , 2011, CMS Books in Mathematics.

[46]  Phillipp Kaestner,et al.  Linear And Nonlinear Programming , 2016 .

[47]  Daniel P. Robinson,et al.  Conformal symplectic and relativistic optimization , 2019, NeurIPS.