Bandit Convex Optimization : √ T Regret in One Dimension

We analyze the minimax regret of the adversarial bandit convex optimization problem. Focusing on the one-dimensional case, we prove that the minimax regret is Θ̃( √ T ) and partially resolve a decade-old open problem. Our analysis is non-constructive, as we do not present a concrete algorithm that attains this regret rate. Instead, we use minimax duality to reduce the problem to a Bayesian setting, where the convex loss functions are drawn from a worst-case distribution, and then we solve the Bayesian version of the problem with a variant of Thompson Sampling. Our analysis features a novel use of convexity, formalized as a “local-to-global” property of convex functions, that may be of independent interest.

[1]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[2]  M. Sion On general minimax theorems , 1958 .

[3]  Robert J . Aumann,et al.  28. Mixed and Behavior Strategies in Infinite Extensive Games , 1964 .

[4]  H. Komiya Elementary proof for Sion's minimax theorem , 1988 .

[5]  Manfred K. Warmuth,et al.  Exponentiated Gradient Versus Gradient Descent for Linear Predictors , 1997, Inf. Comput..

[6]  Y. Freund,et al.  The non-stochastic multi-armed bandit problem , 2001 .

[7]  Robert D. Kleinberg Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[8]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[9]  Thomas P. Hayes,et al.  The Price of Bandit Information for Online Optimization , 2007, NIPS.

[10]  Elad Hazan,et al.  Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[11]  Peter L. Bartlett,et al.  A Stochastic View of Optimal Regret through Minimax Duality , 2009, COLT.

[12]  Lin Xiao,et al.  Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[13]  Sham M. Kakade,et al.  Stochastic Convex Optimization with Bandit Feedback , 2011, SIAM J. Optim..

[14]  Ambuj Tewari,et al.  Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback , 2011, AISTATS.

[15]  Sham M. Kakade,et al.  Towards Minimax Policies for Online Linear Optimization with Bandit Feedback , 2012, COLT.

[16]  Abraham Neyman,et al.  The Maximal Variation of Martingales of Probabilities and Repeated Games with Incomplete Information , 2012, 1208.3164.

[17]  Elad Hazan,et al.  Bandit Convex Optimization: Towards Tight Bounds , 2014, NIPS.

[18]  Yuval Peres,et al.  Towards Optimal Algorithms for Prediction with Expert Advice , 2014, SODA.

[19]  Benjamin Van Roy,et al.  An Information-Theoretic Analysis of Thompson Sampling , 2014, J. Mach. Learn. Res..