RECAPP: Crafting a More Efficient Catalyst for Convex Optimization

The accelerated proximal point algorithm (APPA), also known as “Catalyst”, is a well-established reduction from convex optimization to approximate proximal point computation (i.e., regularized min-imization). This reduction is conceptually elegant and yields strong convergence rate guarantees. However, these rates feature an extraneous logarithmic term arising from the need to compute each proximal point to high accuracy. In this work, we propose a novel Relaxed Error Criterion for Accelerated Proximal Point (RECAPP) that eliminates the need for high accuracy subproblem solutions. We apply RECAPP to two canonical problems: finite-sum and max-structured minimization. For finite-sum problems, we match the best known complexity, previously obtained by carefully-designed problem-specific algorithms. For minimizing max y f ( x, y ) where f is convex in x and strongly-concave in y , we improve on the best known (Catalyst-based) bound by a logarithmic factor.

[1]  A. Gasnikov,et al.  The First Optimal Algorithm for Smooth and Strongly-Convex-Strongly-Concave Minimax Optimization , 2022, NeurIPS.

[2]  Yair Carmon,et al.  Stochastic Bias-Reduced Gradient Methods , 2021, NeurIPS.

[3]  Alexander Gasnikov,et al.  Adaptive Catalyst for Smooth Convex Optimization , 2019, OPTIMA.

[4]  Yangyang Xu,et al.  Lower complexity bounds of first-order methods for convex-concave bilinear saddle-point problems , 2018, Math. Program..

[5]  Niao He,et al.  On the Bias-Variance-Cost Tradeoff of Stochastic Optimization , 2021, NeurIPS.

[6]  Yair Carmon,et al.  Large-Scale Methods for Distributionally Robust Optimization , 2020, NeurIPS.

[7]  Mark W. Schmidt,et al.  Variance-Reduced Methods for Machine Learning , 2020, Proceedings of the IEEE.

[8]  Brian Bullins,et al.  Highly smooth minimization of non-smooth problems , 2020, COLT.

[9]  Chaobing Song,et al.  Stochastic Variance Reduction via Accelerated Dual Averaging for Finite-Sum Optimization , 2020, NeurIPS.

[10]  Aaron Sidford,et al.  Acceleration with a Ball Optimization Oracle , 2020, NeurIPS.

[11]  Renbo Zhao,et al.  A Primal Dual Smoothing Framework for Max-Structured Nonconvex Optimization , 2020, 2003.04375.

[12]  Michael I. Jordan,et al.  Near-Optimal Algorithms for Minimax Optimization , 2020, COLT.

[13]  Niao He,et al.  A Catalyst Framework for Minimax Optimization , 2020, NeurIPS.

[14]  Prateek Jain,et al.  Efficient Algorithms for Smooth Minimax Optimization , 2019, NeurIPS.

[15]  Yin Tat Lee,et al.  Near Optimal Methods for Minimizing Convex Functions with Lipschitz $p$-th Derivatives , 2019, Annual Conference Computational Learning Theory.

[16]  Yi Ma,et al.  Towards Unified Acceleration of High-Order Algorithms under Hölder Continuity and Uniform Convexity , 2019, ArXiv.

[17]  Yin Tat Lee,et al.  Complexity of Highly Parallel Non-Smooth Convex Optimization , 2019, NeurIPS.

[18]  Guanghui Lan,et al.  A unified variance-reduced accelerated gradient method for convex optimization , 2019, NeurIPS.

[19]  Yurii Nesterov,et al.  Lectures on Convex Optimization , 2018 .

[20]  Aleksander Madry,et al.  Towards Deep Learning Models Resistant to Adversarial Attacks , 2017, ICLR.

[21]  Zaïd Harchaoui,et al.  Catalyst Acceleration for First-order Convex Optimization: from Theory to Practice , 2017, J. Mach. Learn. Res..

[22]  Demis Hassabis,et al.  Mastering the game of Go without human knowledge , 2017, Nature.

[23]  Z. Harchaoui,et al.  Catalyst Acceleration for Gradient-Based Non-Convex Optimization , 2017, 1703.10993.

[24]  Zeyuan Allen-Zhu,et al.  Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..

[25]  Sham M. Kakade,et al.  Faster Eigenvector Computation via Shift-and-Invert Preconditioning , 2016, ICML.

[26]  Nathan Srebro,et al.  Tight Complexity Bounds for Optimizing Composite Objectives , 2016, NIPS.

[27]  Tong Zhang,et al.  Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[28]  Peter W. Glynn,et al.  Unbiased Monte Carlo for optimization and functions of expectations via multi-level randomization , 2015, 2015 Winter Simulation Conference (WSC).

[29]  Siu Kwan Lam,et al.  Numba: a LLVM-based Python JIT compiler , 2015, LLVM '15.

[30]  Sham M. Kakade,et al.  Un-regularizing: approximate proximal point and faster stochastic algorithms for empirical risk minimization , 2015, ICML.

[31]  Zaïd Harchaoui,et al.  A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[32]  Lin Xiao,et al.  A Proximal Stochastic Gradient Method with Progressive Variance Reduction , 2014, SIAM J. Optim..

[33]  Tong Zhang,et al.  Accelerating Stochastic Gradient Descent using Predictive Variance Reduction , 2013, NIPS.

[34]  Renato D. C. Monteiro,et al.  An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and Its Implications to Second-Order Methods , 2013, SIAM J. Optim..

[35]  Saverio Salzo,et al.  Inexact and accelerated proximal point algorithms , 2011 .

[36]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[37]  Arkadi Nemirovski,et al.  Prox-Method with Rate of Convergence O(1/t) for Variational Inequalities with Lipschitz Continuous Monotone Operators and Smooth Convex-Concave Saddle Point Problems , 2004, SIAM J. Optim..

[38]  Stefan Heinrich,et al.  Multilevel Monte Carlo Methods , 2001, LSSC.

[39]  Osman Güler,et al.  New Proximal Point Algorithms for Convex Minimization , 1992, SIAM J. Optim..

[40]  Y. Nesterov A method for solving the convex programming problem with convergence rate O(1/k^2) , 1983 .

[41]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[42]  G. M. Korpelevich The extragradient method for finding saddle points and other problems , 1976 .

[43]  E. Rowland Theory of Games and Economic Behavior , 1946, Nature.

[44]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .