论文信息 - Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

Improved Regret for Zeroth-Order Adversarial Bandit Convex Optimisation

We prove that the information-theoretic upper bound on the minimax regret for zeroth-order adversarial bandit convex optimisation is at most $O(d^{2.5} \sqrt{n} \log(n))$, where $d$ is the dimension and $n$ is the number of interactions. This improves on $O(d^{9.5} \sqrt{n} \log(n)^{7.5}$ by Bubeck et al. (2017). The proof is based on identifying an improved exploratory distribution for convex functions.

Tor Lattimore | Tor Lattimore

[1] Xiaowei Hu,et al. (Bandit) Convex Optimization with Biased Noisy Gradient Oracles , 2015, AISTATS.

[2] Yin Tat Lee,et al. Kernel-based methods for bandit convex optimization , 2016, STOC.

[3] P. Valettas,et al. Distances between classical positions of centrally symmetric convex bodies , 2012 .

[4] Tor Lattimore,et al. An Information-Theoretic Approach to Minimax Regret in Partial Monitoring , 2019, COLT.

[5] Apostolos Giannopoulos,et al. Isotropic surface area measures , 1999 .

[6] Sébastien Bubeck,et al. Exploratory distributions for convex functions , 2018 .

[7] Sébastien Bubeck,et al. Bandit Convex Optimization : √ T Regret in One Dimension , 2015 .

[8] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[9] Francesco Orabona. A Modern Introduction to Online Learning , 2019, ArXiv.

[10] Silouanos Brazitikos. Geometry of Isotropic Convex Bodies , 2014 .

[11] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.