Bandit Smooth Convex Optimization: Improving the Bias-Variance Tradeoff

Bandit convex optimization is one of the fundamental problems in the field of online learning. The best algorithm for the general bandit convex optimization problem guarantees a regret of O(T5/6), while the best known lower bound is Ω(T1/2). Many attempts have been made to bridge the huge gap between these bounds. A particularly interesting special case of this problem assumes that the loss functions are smooth. In this case, the best known algorithm guarantees a regret of O(T2/3). We present an efficient algorithm for the bandit smooth convex optimization problem that guarantees a regret of O(T5/8). Our result rules out an Ω(T2/3) lower bound and takes a significant step towards the resolution of this open problem.

[1]  Elad Hazan,et al.  Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[2]  Sham M. Kakade,et al.  Stochastic Convex Optimization with Bandit Feedback , 2011, SIAM J. Optim..

[3]  Yurii Nesterov,et al.  Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[4]  Sébastien Bubeck,et al.  The entropic barrier: a simple and optimal universal self-concordant barrier , 2014, COLT.

[5]  Ambuj Tewari,et al.  Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback , 2011, AISTATS.

[6]  Sébastien Bubeck,et al.  Multi-scale exploration of convex functions and bandit convex optimization , 2015, COLT.

[7]  Yurii Nesterov,et al.  Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[8]  Lin Xiao,et al.  Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[9]  Jacob D. Abernethy,et al.  Beating the adaptive bandit with high probability , 2009, 2009 Information Theory and Applications Workshop.

[10]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[11]  Sébastien Bubeck,et al.  Bandit Convex Optimization : √ T Regret in One Dimension , 2015 .

[12]  Thomas P. Hayes,et al.  The Price of Bandit Information for Online Optimization , 2007, NIPS.

[13]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[14]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[15]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[16]  Elad Hazan,et al.  Bandit Convex Optimization: Towards Tight Bounds , 2014, NIPS.

[17]  Yuval Peres,et al.  Bandit Convex Optimization: \(\sqrt{T}\) Regret in One Dimension , 2015, COLT.

[18]  Yuval Peres,et al.  Bandits with switching costs: T2/3 regret , 2013, STOC.