An Accelerated Directional Derivative Method for Smooth Stochastic Convex Optimization

Abstract We consider smooth stochastic convex optimization problems in the context of algorithms which are based on directional derivatives of the objective function. This context can be considered as an intermediate one between derivative-free optimization and gradient-based optimization. We assume that at any given point and for any given direction, a stochastic approximation for the directional derivative of the objective function at this point and in this direction is available with some additive noise. The noise is assumed to be of an unknown nature, but bounded in the absolute value. We underline that we consider directional derivatives in any direction, as opposed to coordinate descent methods which use only derivatives in coordinate directions. For this setting, we propose a non-accelerated and an accelerated directional derivative method and provide their complexity bounds. Our non-accelerated algorithm has a complexity bound which is similar to the gradient-based algorithm, that is, without any dimension-dependent factor. Our accelerated algorithm has a complexity bound which coincides with the complexity bound of the accelerated gradient-based algorithm up to a factor of square root of the problem dimension. We extend these results to strongly convex problems.

[1]  Anastasia A. Lagunovskaya,et al.  Parallel Algorithms and Probability of Large Deviation for Stochastic Convex Optimization Problems , 2018 .

[2]  Eduard A. Gorbunov,et al.  Derivative-Free Method For Composite Optimization With Applications To Decentralized Distributed Optimization , 2019, IFAC-PapersOnLine.

[3]  Leonid Prigozhin,et al.  Variational model of sandpile growth , 1996, European Journal of Applied Mathematics.

[4]  H. H. Rosenbrock,et al.  An Automatic Method for Finding the Greatest or Least Value of a Function , 1960, Comput. J..

[5]  Yurii Nesterov,et al.  Efficiency of Coordinate Descent Methods on Huge-Scale Optimization Problems , 2012, SIAM J. Optim..

[6]  Alexander Gasnikov,et al.  Gradient-free two-points optimal method for non smooth stochastic convex optimization problem with additional small noise , 2017 .

[7]  A. Juditsky,et al.  Deterministic and Stochastic Primal-Dual Subgradient Algorithms for Uniformly Convex Minimization , 2014 .

[8]  R. E. Wengert,et al.  A simple automatic derivative evaluation program , 1964, Commun. ACM.

[9]  James C. Spall,et al.  Introduction to stochastic search and optimization - estimation, simulation, and control , 2003, Wiley-Interscience series in discrete mathematics and optimization.

[10]  Ohad Shamir,et al.  An Optimal Algorithm for Bandit and Zero-Order Convex Optimization with Two-Point Feedback , 2015, J. Mach. Learn. Res..

[11]  Alexander Gasnikov,et al.  Mirror Descent and Convex Optimization Problems with Non-smooth Inequality Constraints , 2017, 1710.06612.

[12]  Claudio Gentile,et al.  On the generalization ability of on-line learning algorithms , 2001, IEEE Transactions on Information Theory.

[13]  Katya Scheinberg,et al.  Global convergence rate analysis of unconstrained optimization methods based on probabilistic models , 2015, Mathematical Programming.

[14]  Albert S. Berahas,et al.  Global Convergence Rate Analysis of a Generic Line Search Algorithm with Noise , 2019, SIAM J. Optim..

[15]  Lin Xiao,et al.  Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[16]  Zhaosong Lu,et al.  An Accelerated Proximal Coordinate Gradient Method and its Application to Regularized Empirical Risk Minimization , 2014, 1407.1296.

[17]  Peter Richtárik,et al.  Accelerated, Parallel, and Proximal Coordinate Descent , 2013, SIAM J. Optim..

[18]  Alexey Chernov,et al.  Fast Primal-Dual Gradient Method for Strongly Convex Minimization Problems with Linear Constraints , 2016, DOOR.

[19]  Saeed Ghadimi,et al.  Mini-batch stochastic approximation methods for nonconvex stochastic composite optimization , 2013, Mathematical Programming.

[20]  Zeyuan Allen Zhu,et al.  Even Faster Accelerated Coordinate Descent Using Non-Uniform Sampling , 2015, ICML.

[21]  Yurii Nesterov,et al.  Smooth minimization of non-smooth functions , 2005, Math. Program..

[22]  Alexander Gasnikov,et al.  Stochastic Intermediate Gradient Method for Convex Problems with Stochastic Inexact Oracle , 2016, Journal of Optimization Theory and Applications.

[23]  Martin J. Wainwright,et al.  Optimal Rates for Zero-Order Convex Optimization: The Power of Two Function Evaluations , 2013, IEEE Transactions on Information Theory.

[24]  Stefan M. Wild,et al.  Adaptive Sampling Quasi-Newton Methods for Derivative-Free Stochastic Optimization , 2019, 1910.13516.

[25]  Alexander Gasnikov,et al.  Primal–dual accelerated gradient methods with small-dimensional relaxation oracle , 2018, Optim. Methods Softw..

[26]  Lin Xiao,et al.  An Accelerated Proximal Coordinate Gradient Method , 2014, NIPS.

[27]  K. Scheinberg,et al.  A Theoretical and Empirical Comparison of Gradient Approximations in Derivative-Free Optimization , 2019, Foundations of Computational Mathematics.

[28]  Aleksandr Beznosikov,et al.  Gradient-Free Methods for Saddle-Point Problem , 2020, ArXiv.

[29]  Ilnura N. Usmanova,et al.  About accelerated randomized methods , 2015, 1508.02182.

[30]  P. Dvurechensky,et al.  Inexact Relative Smoothness and Strong Convexity for Optimization and Variational Inequalities by Inexact Model , 2020, 2001.09013.

[31]  Alexander Gasnikov,et al.  Gradient Methods for Problems with Inexact Model of the Objective , 2019, MOTOR.

[32]  Tong Zhang,et al.  Accelerated proximal stochastic dual coordinate ascent for regularized loss minimization , 2013, Mathematical Programming.

[33]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[34]  Yuwen Chen,et al.  An Accelerated DFO Algorithm for Finite-sum Convex Functions , 2020, ICML.

[35]  John W. Barrett,et al.  Lakes and rivers in the landscape: A quasi-variational inequality approach , 2014 .

[36]  V. Fabian Stochastic Approximation of Minima with Improved Asymptotic Speed , 1967 .

[37]  Alexander V. Gasnikov,et al.  Stochastic online optimization. Single-point and multi-point non-linear multi-armed bandits. Convex and strongly-convex case , 2017, Autom. Remote. Control..

[38]  Alexander Gasnikov,et al.  Randomized Similar Triangles Method: A Unifying Framework for Accelerated Randomized Optimization Methods (Coordinate Descent, Directional Search, Derivative-Free Method) , 2017, ArXiv.

[39]  John W. Barrett,et al.  A QUASI-VARIATIONAL INEQUALITY PROBLEM IN SUPERCONDUCTIVITY , 2010 .

[40]  Eduard A. Gorbunov,et al.  An Accelerated Method for Derivative-Free Smooth Stochastic Convex Optimization , 2018, SIAM J. Optim..

[41]  Darina Dvinskikh,et al.  Decentralize and Randomize: Faster Algorithm for Wasserstein Barycenters , 2018, NeurIPS.

[42]  Yurii Nesterov,et al.  Introductory Lectures on Convex Optimization - A Basic Course , 2014, Applied Optimization.

[43]  P. Dvurechensky,et al.  Dual approaches to the minimization of strongly convex functionals with a simple structure under affine constraints , 2017 .

[44]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[45]  Katya Scheinberg,et al.  Introduction to derivative-free optimization , 2010, Math. Comput..

[46]  Yurii Nesterov,et al.  Universal gradient methods for convex optimization problems , 2015, Math. Program..

[47]  Stefan M. Wild,et al.  Derivative-free optimization methods , 2019, Acta Numerica.

[48]  Zeyuan Allen Zhu,et al.  Linear Coupling: An Ultimate Unification of Gradient and Mirror Descent , 2014, ITCS.

[49]  Anastasia A. Lagunovskaya,et al.  Gradient-free prox-methods with inexact oracle for stochastic convex optimization problems on a simplex , 2014, 1412.3890.

[50]  D. Anderson,et al.  Algorithms for minimization without derivatives , 1974 .

[51]  Alexander Gasnikov,et al.  Nonlinear stochastic multiarmed bandit problems with inexact oracle , 2015 .

[52]  Jorge Nocedal,et al.  Derivative-Free Optimization of Noisy Functions via Quasi-Newton Methods , 2018, SIAM J. Optim..

[53]  P. Dvurechensky,et al.  Universal intermediate gradient method for convex problems with inexact oracle , 2017, Optim. Methods Softw..

[54]  O. Devolder,et al.  Stochastic first order methods in smooth convex optimization , 2011 .

[55]  Guanghui Lan,et al.  Stochastic Block Mirror Descent Methods for Nonsmooth and Stochastic Optimization , 2013, SIAM J. Optim..

[56]  Yurii Nesterov,et al.  Efficiency of the Accelerated Coordinate Descent Method on Structured Optimization Problems , 2017, SIAM J. Optim..

[57]  Alexander V. Gasnikov,et al.  Gradient-free proximal methods with inexact oracle for convex stochastic nonsmooth optimization problems on the simplex , 2016, Automation and Remote Control.

[58]  Zeyuan Allen-Zhu,et al.  Katyusha: the first direct acceleration of stochastic gradient methods , 2016, J. Mach. Learn. Res..

[59]  Yin Tat Lee,et al.  Efficient Accelerated Coordinate Descent Methods and Faster Algorithms for Solving Linear Systems , 2013, 2013 IEEE 54th Annual Symposium on Foundations of Computer Science.

[60]  BORIS S. MORDUKHOVICH,et al.  Coderivative Analysis of Quasi-variational Inequalities with Applications to Stability and Optimization , 2007, SIAM J. Optim..

[61]  Evgeniya A. Vorontsova,et al.  Accelerated Gradient-Free Optimization Methods with a Non-Euclidean Proximal Operator , 2019, Automation and Remote Control.

[62]  Maksim Zhukovskii,et al.  About the Power Law of the PageRank Vector Component Distribution. Part 2. The Buckley–Osthus Model, Verification of the Power Law for This Model, and Setup of Real Search Engines , 2018 .

[63]  Warren B. Powell,et al.  A unified framework for stochastic optimization , 2019, Eur. J. Oper. Res..

[64]  Darina Dvinskikh,et al.  On Primal and Dual Approaches for Distributed Stochastic Convex Optimization over Networks , 2019, 2019 IEEE 58th Conference on Decision and Control (CDC).

[65]  Yurii Nesterov,et al.  Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[66]  Alexander Gasnikov,et al.  Optimal Decentralized Distributed Algorithms for Stochastic Convex Optimization. , 2019 .

[67]  Xiaowei Hu,et al.  (Bandit) Convex Optimization with Biased Noisy Gradient Oracles , 2015, AISTATS.

[68]  Anton Rodomanov,et al.  Primal-Dual Method for Searching Equilibrium in Hierarchical Congestion Population Games , 2016, DOOR.

[69]  Elena Chernousova,et al.  About the Power Law of the PageRank Vector Component Distribution. Part 1. Numerical Methods for Finding the PageRank Vector , 2017 .

[70]  Guanghui Lan,et al.  An optimal method for stochastic composite optimization , 2011, Mathematical Programming.

[71]  Gleb Gusev,et al.  Learning Supervised PageRank with Gradient-Based and Gradient-Free Optimization Methods , 2016, NIPS.

[72]  A. V. Gasnikov,et al.  Stochastic intermediate gradient method for convex optimization problems , 2016 .