论文信息 - No-Regret Dynamics in the Fenchel Game: A Unified Framework for Algorithmic Convex Optimization - 字舞流文

No-Regret Dynamics in the Fenchel Game: A Unified Framework for Algorithmic Convex Optimization

We develop an algorithmic framework for solving convex optimization problems using no-regret game dynamics. By converting the problem of minimizing a convex function into an auxiliary problem of solving a min-max game in a sequential fashion, we can consider a range of strategies for each of the two-players who must select their actions one after the other. A common choice for these strategies are so-called no-regret learning algorithms, and we describe a number of such and prove bounds on their regret. We then show that many classical first-order methods for convex optimization—including average-iterate gradient descent, the Frank-Wolfe algorithm, the Heavy Ball algorithm, and Nesterov’s acceleration methods—can be interpreted as special cases of our framework as long as each player makes the correct choice of no-regret strategy. Proving convergence rates in this framework becomes very straightforward, as they follow from plugging in the appropriate known regret bounds. Our framework also gives rise to a number of new first-order methods for special cases of convex optimization that were not previously known.

Jacob Abernethy | Kfir Y. Levy | Jun-Kun Wang

[1] Kenneth L. Clarkson,et al. Coresets, sparse greedy approximation, and the Frank-Wolfe algorithm , 2008, SODA '08.

[2] Martin Jaggi,et al. Revisiting Frank-Wolfe: Projection-Free Sparse Convex Optimization , 2013, ICML.

[3] Alexandre d'Aspremont,et al. Linear Bandits on Uniformly Convex Sets , 2021, J. Mach. Learn. Res..

[4] Lin Xiao,et al. Dual Averaging Methods for Regularized Stochastic Learning and Online Optimization , 2009, J. Mach. Learn. Res..

[5] Juan Peypouquet,et al. Fast convergence of inertial dynamics and algorithms with asymptotic vanishing viscosity , 2018, Math. Program..

[6] Arkadi Nemirovski,et al. Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[7] Marc Teboulle,et al. A Fast Iterative Shrinkage-Thresholding Algorithm for Linear Inverse Problems , 2009, SIAM J. Imaging Sci..

[8] Stephen P. Boyd,et al. A Differential Equation for Modeling Nesterov's Accelerated Gradient Method: Theory and Insights , 2014, J. Mach. Learn. Res..

[9] Tor Lattimore,et al. Following the Leader and Fast Rates in Linear Prediction: Curved Constraint Sets and Other Regularities , 2016, NIPS.

[10] Sophia Decker,et al. Approximate Methods In Optimization Problems , 2016 .

[11] Yurii Nesterov,et al. Relatively Smooth Convex Optimization by First-Order Methods, and Applications , 2016, SIAM J. Optim..

[12] Dmitriy Drusvyatskiy,et al. An Optimal First Order Method Based on Optimal Quadratic Averaging , 2016, SIAM J. Optim..

[13] Yurii Nesterov,et al. Dual extrapolation and its applications to solving variational inequalities and related problems , 2003, Math. Program..

[14] Y. Nesterov. A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .

[15] Bin Hu,et al. Dissipativity Theory for Nesterov's Accelerated Method , 2017, ICML.

[16] Guanghui Lan,et al. First-order and Stochastic Optimization Methods for Machine Learning , 2020 .

[17] Karthik Sridharan,et al. Optimization, Learning, and Games with Predictable Sequences , 2013, NIPS.

[18] Yoav Freund,et al. Game theory, on-line prediction and boosting , 1996, COLT '96.

[19] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[20] Volkan Cevher,et al. UniXGrad: A Universal, Adaptive Algorithm with Optimal Guarantees for Constrained Optimization , 2019, NeurIPS.

[21] E. Polovinkin. Strongly convex analysis , 1996 .

[22] Alexandre d'Aspremont,et al. Integration Methods and Optimization Algorithms , 2017, NIPS.

[23] Yi Zhou,et al. An optimal randomized incremental gradient method , 2015, Mathematical Programming.

[24] Andre Wibisono,et al. A variational perspective on accelerated methods in optimization , 2016, Proceedings of the National Academy of Sciences.

[25] Sebastian Pokutta,et al. Complexity of Linear Minimization and Projection on Some Sets , 2021, Oper. Res. Lett..

[26] Philip Wolfe,et al. An algorithm for quadratic programming , 1956 .

[27] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[28] Paul Grigas,et al. New analysis and results for the Frank–Wolfe method , 2013, Mathematical Programming.

[29] Yi Zhou,et al. Conditional Gradient Sliding for Convex Optimization , 2016, SIAM J. Optim..

[30] D. Blackwell. An analog of the minimax theorem for vector payoffs. , 1956 .

[31] Ashok Cutkosky,et al. Anytime Online-to-Batch, Optimism and Acceleration , 2019, ICML.

[32] J. Dunn. Rates of convergence for conditional gradient algorithms near singular and nonsingular extremals , 1979, 1978 IEEE Conference on Decision and Control including the 17th Symposium on Adaptive Processes.

[33] J. Borwein,et al. Convex Analysis And Nonlinear Optimization , 2000 .

[34] Elad Hazan,et al. Faster Rates for the Frank-Wolfe Method over Strongly-Convex Sets , 2014, ICML.

[35] Jun-Kun Wang,et al. On Frank-Wolfe and Equilibrium Computation , 2017, NIPS.

[36] Boris Polyak. Some methods of speeding up the convergence of iteration methods , 1964 .

[37] Kevin Tian,et al. Relative Lipschitzness in Extragradient Methods and a Direct Recipe for Acceleration , 2020, ITCS.

[38] Rong Jin,et al. 25th Annual Conference on Learning Theory Online Optimization with Gradual Variations , 2022 .

[39] Kfir Y. Levy,et al. Fast Rates for Exp-concave Empirical Risk Minimization , 2015, NIPS.

[40] Alexandre M. Bayen,et al. Accelerated Mirror Descent in Continuous and Discrete Time , 2015, NIPS.

[41] Ofer Meshi,et al. Linear-Memory and Decomposition-Invariant Linearly Convergent Conditional Gradient Algorithm for Structured Polytopes , 2016, NIPS.

[42] 丸山徹. Convex Analysisの二,三の進展について , 1977 .

[43] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, J. Comput. Syst. Sci..

[44] Stephen P. Boyd,et al. Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[45] Francis R. Bach,et al. From Averaging to Acceleration, There is Only a Step-size , 2015, COLT.

[46] Michael P. Friedlander,et al. Gauge Optimization and Duality , 2013, SIAM J. Optim..

[47] Mohit Singh,et al. A geometric alternative to Nesterov's accelerated gradient descent , 2015, ArXiv.

[48] Franziska Wulf,et al. Minimization Methods For Non Differentiable Functions , 2016 .

[49] Michael I. Jordan,et al. Understanding the acceleration phenomenon via high-resolution differential equations , 2018, Mathematical Programming.

[50] Sebastian Pokutta,et al. Blended Conditonal Gradients , 2019, ICML.

[51] Michael I. Jordan,et al. A Lyapunov Analysis of Accelerated Methods in Optimization , 2021, J. Mach. Learn. Res..

[52] Y. Nesterov. A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[53] Kfir Y. Levy,et al. Online to Offline Conversions, Universality and Adaptive Minibatch Sizes , 2017, NIPS.

[54] S. Kakade,et al. On the duality of strong convexity and strong smoothness : Learning applications and matrix regularization , 2009 .

[55] Bastian Goldlücke,et al. Variational Analysis , 2014, Computer Vision, A Reference Guide.

[56] Martin Jaggi,et al. On the Global Linear Convergence of Frank-Wolfe Optimization Variants , 2015, NIPS.

[57] Haihao Lu,et al. Generalized stochastic Frank–Wolfe algorithm with stochastic “substitute” gradient for structured convex optimization , 2018, Mathematical Programming.

[58] Jun-Kun Wang,et al. Acceleration through Optimistic No-Regret Dynamics , 2018, NeurIPS.

[59] Zaïd Harchaoui,et al. Conditional gradient algorithms for norm-regularized smooth convex optimization , 2013, Math. Program..

[60] Zeyuan Allen Zhu,et al. Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.

[61] Jelena Diakonikolas,et al. Accelerated Extra-Gradient Descent: A Novel Accelerated First-Order Method , 2017, ITCS.

[62] Benjamin Recht,et al. Analysis and Design of Optimization Algorithms via Integral Quadratic Constraints , 2014, SIAM J. Optim..

[63] Francesco Orabona. A Modern Introduction to Online Learning , 2019, ArXiv.

[64] J. Hiriart-Urruty,et al. Convex analysis and minimization algorithms , 1993 .

[65] Robert M. Freund,et al. Dual gauge programs, with applications to quadratic programming and the minimum-norm problem , 1987, Math. Program..

[66] A. Banerjee. Convex Analysis and Optimization , 2006 .

[67] Elad Hazan,et al. A Linearly Convergent Conditional Gradient Algorithm with Applications to Online and Stochastic Optimization , 2013, 1301.4666.

[68] James Hannan,et al. 4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[69] Yurii Nesterov,et al. Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[70] Jelena Diakonikolas,et al. The Approximate Duality Gap Technique: A Unified Theory of First-Order Methods , 2017, SIAM J. Optim..