No-Regret Dynamics in the Fenchel Game: A Unified Framework for Algorithmic Convex Optimization

We develop an algorithmic framework for solving convex optimization problems using no-regret game dynamics. By converting the problem of minimizing a convex function into an auxiliary problem of solving a min-max game in a sequential fashion, we can consider a range of strategies for each of the two-players who must select their actions one after the other. A common choice for these strategies are so-called no-regret learning algorithms, and we describe a number of such and prove bounds on their regret. We then show that many classical first-order methods for convex optimization—including average-iterate gradient descent, the Frank-Wolfe algorithm, the Heavy Ball algorithm, and Nesterov’s acceleration methods—can be interpreted as special cases of our framework as long as each player makes the correct choice of no-regret strategy. Proving convergence rates in this framework becomes very straightforward, as they follow from plugging in the appropriate known regret bounds. Our framework also gives rise to a number of new first-order methods for special cases of convex optimization that were not previously known.

[1]  Kenneth L. Clarkson,et al.  Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm , 2008, SODA '08.

[2]  Martin Jaggi,et al.  Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[3]  Alexandre d'Aspremont,et al.  Linear Bandits on Uniformly Convex Sets , 2021, J. Mach. Learn. Res..

[4]  Lin Xiao,et al.  Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[5]  Juan Peypouquet,et al.  Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity , 2018, Math. Program..

[6]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[7]  Marc Teboulle,et al.  A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[8]  Stephen P. Boyd,et al.  A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[9]  Tor Lattimore,et al.  Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities , 2016, NIPS.

[10]  Sophia Decker,et al.  Approximate Methods In Optimization Problems , 2016 .

[11]  Yurii Nesterov,et al.  Relatively Smooth Convex Optimization by First-Order Methods, and Applications , 2016, SIAM J. Optim..

[12]  Dmitriy Drusvyatskiy,et al.  An Optimal First Order Method Based on Optimal Quadratic Averaging , 2016, SIAM J. Optim..

[13]  Yurii Nesterov,et al.  Dual extrapolation and its applications to solving variational inequalities and related problems , 2003, Math. Program..

[14]  Y. Nesterov A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .

[15]  Bin Hu,et al.  Dissipativity Theory for Nesterov's Accelerated Method , 2017, ICML.

[16]  Guanghui Lan,et al.  First-order and Stochastic Optimization Methods for Machine Learning , 2020 .

[17]  Karthik Sridharan,et al.  Optimization, Learning, and Games with Predictable Sequences , 2013, NIPS.

[18]  Yoav Freund,et al.  Game theory, on-line prediction and boosting , 1996, COLT '96.

[19]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[20]  Volkan Cevher,et al.  UniXGrad: A Universal, Adaptive Algorithm with Optimal Guarantees for Constrained Optimization , 2019, NeurIPS.

[21]  E. Polovinkin Strongly convex analysis , 1996 .

[22]  Alexandre d'Aspremont,et al.  Integration Methods and Optimization Algorithms , 2017, NIPS.

[23]  Yi Zhou,et al.  An optimal randomized incremental gradient method , 2015, Mathematical Programming.

[24]  Andre Wibisono,et al.  A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[25]  Sebastian Pokutta,et al.  Complexity of Linear Minimization and Projection on Some Sets , 2021, Oper. Res. Lett..

[26]  Philip Wolfe,et al.  An algorithm for quadratic programming , 1956 .

[27]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[28]  Paul Grigas,et al.  New analysis and results for the Frank–Wolfe method , 2013, Mathematical Programming.

[29]  Yi Zhou,et al.  Conditional Gradient Sliding for Convex Optimization , 2016, SIAM J. Optim..

[30]  D. Blackwell An analog of the minimax theorem for vector payoffs. , 1956 .

[31]  Ashok Cutkosky,et al.  Anytime Online-to-Batch, Optimism and Acceleration , 2019, ICML.

[32]  J. Dunn Rates of convergence for conditional gradient algorithms near singular and nonsingular extremals , 1979, 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes.

[33]  J. Borwein,et al.  Convex Analysis And Nonlinear Optimization , 2000 .

[34]  Elad Hazan,et al.  Faster Rates for the Frank-Wolfe Method over Strongly-Convex Sets , 2014, ICML.

[35]  Jun-Kun Wang,et al.  On Frank-Wolfe and Equilibrium Computation , 2017, NIPS.

[36]  Boris Polyak Some methods of speeding up the convergence of iteration methods , 1964 .

[37]  Kevin Tian,et al.  Relative Lipschitzness in Extragradient Methods and a Direct Recipe for Acceleration , 2020, ITCS.

[38]  Rong Jin,et al.  25th Annual Conference on Learning Theory Online Optimization with Gradual Variations , 2022 .

[39]  Kfir Y. Levy,et al.  Fast Rates for Exp-concave Empirical Risk Minimization , 2015, NIPS.

[40]  Alexandre M. Bayen,et al.  Accelerated Mirror Descent in Continuous and Discrete Time , 2015, NIPS.

[41]  Ofer Meshi,et al.  Linear-Memory and Decomposition-Invariant Linearly Convergent Conditional Gradient Algorithm for Structured Polytopes , 2016, NIPS.

[42]  丸山 徹 Convex Analysisの二,三の進展について , 1977 .

[43]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[44]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[45]  Francis R. Bach,et al.  From Averaging to Acceleration, There is Only a Step-size , 2015, COLT.

[46]  Michael P. Friedlander,et al.  Gauge Optimization and Duality , 2013, SIAM J. Optim..

[47]  Mohit Singh,et al.  A geometric alternative to Nesterov's accelerated gradient descent , 2015, ArXiv.

[48]  Franziska Wulf,et al.  Minimization Methods For Non Differentiable Functions , 2016 .

[49]  Michael I. Jordan,et al.  Understanding the acceleration phenomenon via high-resolution differential equations , 2018, Mathematical Programming.

[50]  Sebastian Pokutta,et al.  Blended Conditonal Gradients , 2019, ICML.

[51]  Michael I. Jordan,et al.  A Lyapunov Analysis of Accelerated Methods in Optimization , 2021, J. Mach. Learn. Res..

[52]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[53]  Kfir Y. Levy,et al.  Online to Offline Conversions, Universality and Adaptive Minibatch Sizes , 2017, NIPS.

[54]  S. Kakade,et al.  On the duality of strong convexity and strong smoothness : Learning applications and matrix regularization , 2009 .

[55]  Bastian Goldlücke,et al.  Variational Analysis , 2014, Computer Vision, A Reference Guide.

[56]  Martin Jaggi,et al.  On the Global Linear Convergence of Frank-Wolfe Optimization Variants , 2015, NIPS.

[57]  Haihao Lu,et al.  Generalized stochastic Frank–Wolfe algorithm with stochastic “substitute” gradient for structured convex optimization , 2018, Mathematical Programming.

[58]  Jun-Kun Wang,et al.  Acceleration through Optimistic No-Regret Dynamics , 2018, NeurIPS.

[59]  Zaïd Harchaoui,et al.  Conditional gradient algorithms for norm-regularized smooth convex optimization , 2013, Math. Program..

[60]  Zeyuan Allen Zhu,et al.  Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.

[61]  Jelena Diakonikolas,et al.  Accelerated Extra-Gradient Descent: A Novel Accelerated First-Order Method , 2017, ITCS.

[62]  Benjamin Recht,et al.  Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints , 2014, SIAM J. Optim..

[63]  Francesco Orabona A Modern Introduction to Online Learning , 2019, ArXiv.

[64]  J. Hiriart-Urruty,et al.  Convex analysis and minimization algorithms , 1993 .

[65]  Robert M. Freund,et al.  Dual gauge programs, with applications to quadratic programming and the minimum-norm problem , 1987, Math. Program..

[66]  A. Banerjee Convex Analysis and Optimization , 2006 .

[67]  Elad Hazan,et al.  A Linearly Convergent Conditional Gradient Algorithm with Applications to Online and Stochastic Optimization , 2013, 1301.4666.

[68]  James Hannan,et al.  4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[69]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[70]  Jelena Diakonikolas,et al.  The Approximate Duality Gap Technique: A Unified Theory of First-Order Methods , 2017, SIAM J. Optim..