论文信息 - Multi-scale exploration of convex functions and bandit convex optimization

Multi-scale exploration of convex functions and bandit convex optimization

We construct a new map from a convex function to a distribution on its domain, with the property that this distribution is a multi-scale exploration of the function. We use this map to solve a decade-old open problem in adversarial bandit convex optimization by showing that the minimax regret for this problem is $\tilde{O}(\mathrm{poly}(n) \sqrt{T})$, where $n$ is the dimension and $T$ the number of rounds. This bound is obtained by studying the dual Bayesian maximin regret via the information ratio analysis of Russo and Van Roy, and then using the multi-scale exploration to solve the Bayesian problem.

Sébastien Bubeck | Ronen Eldan | Sébastien Bubeck | Ronen Eldan

[1] B. Klartag. On convex perturbations with a bounded isotropic constant , 2006 .

[2] Sébastien Bubeck,et al. Bandit Convex Optimization : √ T Regret in One Dimension , 2015 .

[3] Sham M. Kakade,et al. Stochastic Convex Optimization with Bandit Feedback , 2011, SIAM J. Optim..

[4] Benjamin Van Roy,et al. An Information-Theoretic Analysis of Thompson Sampling , 2014, J. Mach. Learn. Res..

[5] Robert D. Kleinberg. Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[6] Benjamin Van Roy,et al. Learning to Optimize via Information-Directed Sampling , 2014, NIPS.

[7] Ricardo G. Durán,et al. An optimal Poincare inequality in L^1 for convex domains , 2003 .

[8] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[9] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.