论文信息 - Bandit Smooth Convex Optimization: Improving the Bias-Variance Tradeoff

Bandit Smooth Convex Optimization: Improving the Bias-Variance Tradeoff

Bandit convex optimization is one of the fundamental problems in the field of online learning. The best algorithm for the general bandit convex optimization problem guarantees a regret of O(T5/6), while the best known lower bound is Ω(T1/2). Many attempts have been made to bridge the huge gap between these bounds. A particularly interesting special case of this problem assumes that the loss functions are smooth. In this case, the best known algorithm guarantees a regret of O(T2/3). We present an efficient algorithm for the bandit smooth convex optimization problem that guarantees a regret of O(T5/8). Our result rules out an Ω(T2/3) lower bound and takes a significant step towards the resolution of this open problem.

[1] Elad Hazan,et al. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[2] Sham M. Kakade,et al. Stochastic Convex Optimization with Bandit Feedback , 2011, SIAM J. Optim..

[3] Yurii Nesterov,et al. Primal-dual subgradient methods for convex problems , 2005, Math. Program..

[4] Sébastien Bubeck,et al. The entropic barrier: a simple and optimal universal self-concordant barrier , 2014, COLT.

[5] Ambuj Tewari,et al. Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback , 2011, AISTATS.

[6] Sébastien Bubeck,et al. Multi-scale exploration of convex functions and bandit convex optimization , 2015, COLT.

[7] Yurii Nesterov,et al. Interior-point polynomial algorithms in convex programming , 1994, Siam studies in applied mathematics.

[8] Lin Xiao,et al. Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[9] Jacob D. Abernethy,et al. Beating the adaptive bandit with high probability , 2009, 2009 Information Theory and Applications Workshop.

[10] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[11] Sébastien Bubeck,et al. Bandit Convex Optimization : √ T Regret in One Dimension , 2015 .

[12] Thomas P. Hayes,et al. The Price of Bandit Information for Online Optimization , 2007, NIPS.

[13] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[14] Shai Shalev-Shwartz,et al. Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[15] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[16] Elad Hazan,et al. Bandit Convex Optimization: Towards Tight Bounds , 2014, NIPS.

[17] Yuval Peres,et al. Bandit Convex Optimization: \(\sqrt{T}\) Regret in One Dimension , 2015, COLT.

[18] Yuval Peres,et al. Bandits with switching costs: T2/3 regret , 2013, STOC.