Adaptive Catalyst for Smooth Convex Optimization

In 2015 there appears a universal framework Catalyst that allows to accelerate almost arbitrary non-accelerated deterministic and randomized algorithms for smooth convex optimization problems Lin et al. (2015). This technique finds a lot of applications in Machine Learning due to the possibility to deal with sum-type target functions. The significant part of the Catalyst approach is accelerated proximal outer gradient method. This method used as an envelope for non-accelerated inner algorithm for the regularized auxiliary problem. One of the main practical problem of this approach is the selection of this regularization parameter. There exists a nice theory for that Lin et al. (2018), but this theory required prior knowledge about the smoothness constant of the target function. In this paper, we propose an adaptive variant of Catalyst that doesn’t require such information. In combination with the adaptive inner nonaccelerated algorithm, we propose accelerated variants of well-known methods: steepest descent, adaptive coordinate descent, alternating minimization.

[1]  Eduard A. Gorbunov,et al.  Reachability of Optimal Convergence Rate Estimates for High-Order Numerical Convex Optimization Methods , 2019, Доклады Академии наук.

[2]  Andre Wibisono,et al.  Accelerating Rescaled Gradient Descent: Fast Optimization of Smooth Functions , 2019, NeurIPS.

[3]  Zaïd Harchaoui,et al.  A Universal Catalyst for First-Order Optimization , 2015, NIPS.

[4]  Amir Beck,et al.  First-Order Methods in Optimization , 2017 .

[5]  Martin J. Wainwright,et al.  Optimal Rates for Zero-Order Convex Optimization: The Power of Two Function Evaluations , 2013, IEEE Transactions on Information Theory.

[6]  A. V. Gasnikov,et al.  Primal-dual accelerated gradient descent with line search for convex and nonconvex optimization problems , 2019, Доклады Академии наук.

[7]  Peter Richtárik,et al.  SGD: General Analysis and Improved Rates , 2019, ICML 2019.

[8]  Pavel Dvurechensky,et al.  Optimal Combination of Tensor Optimization Methods , 2020, OPTIMA.

[9]  Robert M. Gower,et al.  Optimal mini-batch and step sizes for SAGA , 2019, ICML.

[10]  Julien Mairal,et al.  A Generic Acceleration Framework for Stochastic Composite Optimization , 2019, NeurIPS.

[11]  Zeyuan Allen Zhu,et al.  Optimal Black-Box Reductions Between Optimization Objectives , 2016, NIPS.

[12]  A. Gasnikov,et al.  Accelerated Gradient Sliding for Minimizing a Sum of Functions , 2020 .

[13]  S. Guminov,et al.  Alternating minimization methods for strongly convex optimization , 2019, Journal of Inverse and Ill-posed Problems.

[14]  Alexander Gasnikov,et al.  Optimal Decentralized Distributed Algorithms for Stochastic Convex Optimization. , 2019 .

[15]  Niao He,et al.  A Catalyst Framework for Minimax Optimization , 2020, NeurIPS.

[16]  Jelena Diakonikolas,et al.  Conjugate Gradients and Accelerated Methods Unified: The Approximate Duality Gap View , 2019, ArXiv.

[17]  Dmitry Kovalev,et al.  Optimal and Practical Algorithms for Smooth and Strongly Convex Decentralized Optimization , 2020, NeurIPS.

[18]  Sashank J. Reddi,et al.  SCAFFOLD: Stochastic Controlled Averaging for Federated Learning , 2019, ICML.

[19]  Sebastian U. Stich,et al.  Local SGD Converges Fast and Communicates Little , 2018, ICLR.

[20]  Peter Richtárik,et al.  Better Communication Complexity for Local SGD , 2019, ArXiv.

[21]  Alexander Gasnikov,et al.  Gradient-free two-points optimal method for non smooth stochastic convex optimization problem with additional small noise , 2017 .

[22]  Peter Richtárik,et al.  A Unified Theory of SGD: Variance Reduction, Sampling, Quantization and Coordinate Descent , 2019, AISTATS.

[23]  D. Hilbert Ein Beitrag zur Theorie des Legendre'schen Polynoms , 1894 .

[24]  Alexander Gasnikov,et al.  Accelerated Alternating Minimization , 2019, ArXiv.

[25]  O. Nelles,et al.  An Introduction to Optimization , 1996, IEEE Antennas and Propagation Magazine.

[26]  Ohad Shamir,et al.  Is Local SGD Better than Minibatch SGD? , 2020, ICML.

[27]  P. Alam,et al.  R , 1823, The Herodotus Encyclopedia.

[28]  Alexander Gasnikov,et al.  Adaptive Gradient Descent for Convex and Non-Convex Stochastic Optimization , 2019, 1911.08380.

[29]  Zhouchen Lin,et al.  Revisiting EXTRA for Smooth Distributed Optimization , 2020, SIAM J. Optim..

[30]  Stephen J. Wright Coordinate descent algorithms , 2015, Mathematical Programming.

[31]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[32]  Michael I. Jordan,et al.  On Gradient Descent Ascent for Nonconvex-Concave Minimax Problems , 2019, ICML.

[33]  Ohad Shamir,et al.  An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback , 2015, J. Mach. Learn. Res..

[34]  Stephen P. Boyd,et al.  Proximal Algorithms , 2013, Found. Trends Optim..

[35]  P. Dvurechensky,et al.  Oracle Complexity Separation in Convex Optimization , 2020, J. Optim. Theory Appl..

[36]  Peter Richtárik,et al.  Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[37]  Yurii Nesterov,et al.  Lectures on Convex Optimization , 2018 .

[38]  Sébastien Bubeck,et al.  Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[39]  Tong Zhang,et al.  Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[40]  Etienne de Klerk,et al.  On the worst-case complexity of the gradient method with exact line search for smooth strongly convex functions , 2016, Optimization Letters.

[41]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[42]  Eduard A. Gorbunov,et al.  An Accelerated Directional Derivative Method for Smooth Stochastic Convex Optimization , 2018, Eur. J. Oper. Res..

[43]  Jérôme Malick,et al.  A Delay-tolerant Proximal-Gradient Algorithm for Distributed Learning , 2018, ICML.

[44]  Hadrien Hendrikx,et al.  Dual-Free Stochastic Decentralized Optimization with Variance Reduction , 2020, NeurIPS.

[45]  Yin Tat Lee,et al.  Near Optimal Methods for Minimizing Convex Functions with Lipschitz $p$-th Derivatives , 2019, Annual Conference Computational Learning Theory.

[46]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[47]  Yurii Nesterov,et al.  Contracting Proximal Methods for Smooth Convex Optimization , 2019, SIAM J. Optim..

[48]  Renato D. C. Monteiro,et al.  An Accelerated Hybrid Proximal Extragradient Method for Convex Optimization and Its Implications to Second-Order Methods , 2013, SIAM J. Optim..

[49]  R. Rockafellar Monotone Operators and the Proximal Point Algorithm , 1976 .

[50]  Zaïd Harchaoui,et al.  Catalyst Acceleration for First-order Convex Optimization: from Theory to Practice , 2017, J. Mach. Learn. Res..

[51]  A. Gasnikov,et al.  Near-Optimal Hyperfast Second-Order Method for convex optimization and its Sliding. , 2020, 2002.09050.

[52]  Yurii Nesterov,et al.  Confidence level solutions for stochastic programming , 2000, Autom..

[53]  A. Gasnikov Universal gradient descent , 2017, 1711.00394.

[54]  N. Tupitsa Accelerated Alternating Minimization and Adaptability to Strong Convexity , 2020, 2006.09097.

[55]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[56]  Yurii Nesterov,et al.  Inexact Tensor Methods with Dynamic Accuracies , 2020, ICML.

[57]  Eduard A. Gorbunov,et al.  An Accelerated Method for Derivative-Free Smooth Stochastic Convex Optimization , 2018, SIAM J. Optim..

[58]  Yurii Nesterov,et al.  Efficiency of the Accelerated Coordinate Descent Method on Structured Optimization Problems , 2017, SIAM J. Optim..

[59]  Alexander V. Gasnikov,et al.  Gradient-free proximal methods with inexact oracle for convex stochastic nonsmooth optimization problems on the simplex , 2016, Automation and Remote Control.

[60]  Jelena Diakonikolas,et al.  Alternating Randomized Block Coordinate Descent , 2018, ICML.

[61]  Z. Harchaoui,et al.  Catalyst Acceleration for Gradient-Based Non-Convex Optimization , 2017, 1703.10993.

[62]  Nathan Srebro,et al.  Graph Oracle Models, Lower Bounds, and Gaps for Parallel Stochastic Optimization , 2018, NeurIPS.

[63]  P. Alam ‘S’ , 2021, Composites Engineering: An A–Z Guide.

[64]  Francis R. Bach,et al.  Stochastic Variance Reduction Methods for Saddle-Point Problems , 2016, NIPS.

[65]  P. Dvurechensky,et al.  Accelerated meta-algorithm for convex optimization , 2020, 2004.08691.

[66]  Optimal Accelerated Variance Reduced EXTRA and DIGing for Strongly Convex and Smooth Decentralized Optimization , 2020, ArXiv.

[67]  Anastasia A. Lagunovskaya,et al.  Parallel Algorithms and Probability of Large Deviation for Stochastic Convex Optimization Problems , 2018 .

[68]  ZhangTong,et al.  Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2016 .

[69]  Mark W. Schmidt,et al.  Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.