Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most $O(d^{2.5} \sqrt{n} \log(n))$, where $d$ is the dimension and $n$ is the number of interactions. This improves on $O(d^{9.5} \sqrt{n} \log(n)^{7.5}$ by Bubeck et al. (2017). The proof is based on identifying an improved exploratory distribution for convex functions.

[1]  Xiaowei Hu,et al.  (Bandit) Convex Optimization with Biased Noisy Gradient Oracles , 2015, AISTATS.

[2]  Yin Tat Lee,et al.  Kernel-based methods for bandit convex optimization , 2016, STOC.

[3]  P. Valettas,et al.  Distances between classical positions of centrally symmetric convex bodies , 2012 .

[4]  Tor Lattimore,et al.  An Information-Theoretic Approach to Minimax Regret in Partial Monitoring , 2019, COLT.

[5]  Apostolos Giannopoulos,et al.  Isotropic surface area measures , 1999 .

[6]  Sébastien Bubeck,et al.  Exploratory distributions for convex functions , 2018 .

[7]  Sébastien Bubeck,et al.  Bandit Convex Optimization : √ T Regret in One Dimension , 2015 .

[8]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[9]  Francesco Orabona A Modern Introduction to Online Learning , 2019, ArXiv.

[10]  Silouanos Brazitikos Geometry of Isotropic Convex Bodies , 2014 .

[11]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[12]  Yuanzhi Li,et al.  An optimal algorithm for bandit convex optimization , 2016, ArXiv.

[13]  Benjamin Van Roy,et al.  An Information-Theoretic Analysis of Thompson Sampling , 2014, J. Mach. Learn. Res..

[14]  Gábor Lugosi,et al.  Prediction, learning, and games , 2006 .

[15]  Elad Hazan,et al.  Bandit Convex Optimization: Towards Tight Bounds , 2014, NIPS.

[16]  V. Milman,et al.  Asymptotic Geometric Analysis, Part I , 2015 .

[17]  Yuval Peres,et al.  Bandit Convex Optimization: \(\sqrt{T}\) Regret in One Dimension , 2015, COLT.

[18]  Benjamin Van Roy,et al.  Learning to Optimize via Information-Directed Sampling , 2014, NIPS.