论文信息 - Bandit Convex Optimization : √ T Regret in One Dimension

Bandit Convex Optimization : √ T Regret in One Dimension

We analyze the minimax regret of the adversarial bandit convex optimization problem. Focusing on the one-dimensional case, we prove that the minimax regret is Θ̃( √ T ) and partially resolve a decade-old open problem. Our analysis is non-constructive, as we do not present a concrete algorithm that attains this regret rate. Instead, we use minimax duality to reduce the problem to a Bayesian setting, where the convex loss functions are drawn from a worst-case distribution, and then we solve the Bayesian version of the problem with a variant of Thompson Sampling. Our analysis features a novel use of convexity, formalized as a “local-to-global” property of convex functions, that may be of independent interest.

Sébastien Bubeck | O. Dekel | Tomer Koren

[1] W. R. Thompson. ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[2] M. Sion. On general minimax theorems , 1958 .

[3] Robert J . Aumann,et al. 28. Mixed and Behavior Strategies in Infinite Extensive Games , 1964 .

[4] H. Komiya. Elementary proof for Sion's minimax theorem , 1988 .

[5] Manfred K. Warmuth,et al. Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[6] Y. Freund,et al. The non-stochastic multi-armed bandit problem , 2001 .

[7] Robert D. Kleinberg. Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[8] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[9] Thomas P. Hayes,et al. The Price of Bandit Information for Online Optimization , 2007, NIPS.

[10] Elad Hazan,et al. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[11] Peter L. Bartlett,et al. A Stochastic View of Optimal Regret through Minimax Duality , 2009, COLT.