Gradient-free optimization of highly smooth functions: improved analysis and a new algorithm

This work studies minimization problems with zero-order noisy oracle information under the assumption that the objective function is highly smooth and possibly satisfies additional properties. We consider two kinds of zero-order projected gradient descent algorithms, which differ in the form of the gradient estimator. The first algorithm uses a gradient estimator based on randomization over the $\ell_2$ sphere due to Bach and Perchet (2016). We present an improved analysis of this algorithm on the class of highly smooth and strongly convex functions studied in the prior work, and we derive rates of convergence for two more general classes of non-convex functions. Namely, we consider highly smooth functions satisfying the Polyak-{\L}ojasiewicz condition and the class of highly smooth functions with no additional property. The second algorithm is based on randomization over the $\ell_1$ sphere, and it extends to the highly smooth setting the algorithm that was recently proposed for Lipschitz convex functions in Akhavan et al. (2022). We show that, in the case of noiseless oracle, this novel algorithm enjoys better bounds on bias and variance than the $\ell_2$ randomization and the commonly used Gaussian randomization algorithms, while in the noisy case both $\ell_1$ and $\ell_2$ algorithms benefit from similar improved theoretical guarantees. The improvements are achieved thanks to a new proof techniques based on Poincar\'e type inequalities for uniform distributions on the $\ell_1$ or $\ell_2$ spheres. The results are established under weak (almost adversarial) assumptions on the noise. Moreover, we provide minimax lower bounds proving optimality or near optimality of the obtained upper bounds in several cases.

[1]  A. Tsybakov,et al.  A gradient estimator via L1-randomization for online zero-order optimization with two point feedback , 2022, NeurIPS.

[2]  A. Gasnikov,et al.  Improved exploitation of higher order smoothness in derivative-free optimization , 2022, Optimization Letters.

[3]  Massimiliano Pontil,et al.  Distributed Zero-Order Optimization under Adversarial Noise , 2021, NeurIPS.

[4]  A. Tsybakov,et al.  Exploiting Higher Order Smoothness in Derivative-free Optimization and Continuous Bandits , 2020, NeurIPS.

[5]  John C. Duchi,et al.  Lower bounds for non-convex stochastic optimization , 2019, Mathematical Programming.

[6]  M. V. Balashov,et al.  Gradient Projection and Conditional Gradient Methods for Constrained Nonconvex Minimization , 2019, Numerical Functional Analysis and Optimization.

[7]  Yurii Nesterov,et al.  Lectures on Convex Optimization , 2018 .

[8]  Krishnakumar Balasubramanian,et al.  Zeroth-Order Nonconvex Stochastic Optimization: Handling Constraints, High Dimensionality, and Saddle Points , 2018, Foundations of Computational Mathematics.

[9]  Yair Carmon,et al.  Lower bounds for finding stationary points I , 2017, Mathematical Programming.

[10]  Yurii Nesterov,et al.  Random Gradient-Free Minimization of Convex Functions , 2015, Foundations of Computational Mathematics.

[11]  L. Rosasco,et al.  Convergence of the forward-backward algorithm: beyond the worst-case with the help of geometry , 2017, Mathematical Programming.

[12]  Alexander V. Gasnikov,et al.  Gradient-free proximal methods with inexact oracle for convex stochastic nonsmooth optimization problems on the simplex , 2016, Automation and Remote Control.

[13]  Mark W. Schmidt,et al.  Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition , 2016, ECML/PKDD.

[14]  Vianney Perchet,et al.  Highly-Smooth Zero-th Order Online Optimization , 2016, COLT.

[15]  Sébastien Bubeck Convex Optimization: Algorithms and Complexity , 2014, Found. Trends Mach. Learn..

[16]  Martin J. Wainwright,et al.  Optimal Rates for Zero-Order Convex Optimization: The Power of Two Function Evaluations , 2013, IEEE Transactions on Information Theory.

[17]  Saeed Ghadimi,et al.  Stochastic First- and Zeroth-Order Methods for Nonconvex Stochastic Programming , 2013, SIAM J. Optim..

[18]  Robert D. Nowak,et al.  Query Complexity of Derivative-Free Optimization , 2012, NIPS.

[19]  Sham M. Kakade,et al.  Stochastic Convex Optimization with Bandit Feedback , 2011, SIAM J. Optim..

[20]  Lin Xiao,et al.  Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[21]  S. Mendelson,et al.  A probabilistic approach to the geometry of the ℓᵨⁿ-ball , 2005, math/0503650.

[22]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[23]  J. Dippon,et al.  Accelerated randomized stochastic optimization , 2003 .

[24]  J. Zinn,et al.  On the Volume of the Intersection of Two L n p Balls , 1989, math/9201206.

[25]  R. Osserman The isoperimetric inequality , 1978 .

[26]  V. Fabian Stochastic Approximation of Minima with Improved Asymptotic Speed , 1967 .

[27]  Tor Lattimore,et al.  Improved Regret for Zeroth-Order Stochastic Convex Bandits , 2021, COLT.

[28]  Arkadi Nemirovski,et al.  Topics in Non-Parametric Statistics , 2000 .

[29]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[30]  Boris Polyak Gradient methods for the minimisation of functionals , 1963 .