论文信息 - An optimal algorithm for bandit convex optimization - 字舞流文

An optimal algorithm for bandit convex optimization

We consider the problem of online convex optimization against an arbitrary adversary with bandit feedback, known as bandit convex optimization. We give the first $\tilde{O}(\sqrt{T})$-regret algorithm for this setting based on a novel application of the ellipsoid method to online learning. This bound is known to be tight up to logarithmic factors. Our analysis introduces new tools in discrete convex geometry.

Yuanzhi Li | Elad Hazan | Elad Hazan | Yuanzhi Li

[1] Ambuj Tewari,et al. Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback , 2011, AISTATS.

[2] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[3] Baruch Awerbuch,et al. Online linear optimization and adaptive routing , 2008, J. Comput. Syst. Sci..

[4] L. Lovász,et al. Geometric Algorithms and Combinatorial Optimization , 1981 .

[5] Yuval Peres,et al. Bandit Convex Optimization: \(\sqrt{T}\) Regret in One Dimension , 2015, COLT.

[6] Sébastien Bubeck,et al. Multi-scale exploration of convex functions and bandit convex optimization , 2015, COLT.

[7] Elad Hazan,et al. Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[8] John L. Nazareth,et al. Introduction to derivative-free optimization , 2010, Math. Comput..

[9] Ohad Shamir,et al. On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization , 2012, COLT.

[10] Hariharan Narayanan,et al. Random Walk Approach to Regret Minimization , 2010, NIPS.

[11] F. John. Extremum Problems with Inequalities as Subsidiary Conditions , 2014 .

[12] Elad Hazan,et al. Hard-Margin Active Linear Regression , 2014, ICML.

[13] Peter W. Glynn,et al. Consistency of Multidimensional Convex Regression , 2012, Oper. Res..

[14] K. Ball. An Elementary Introduction to Modern Convex Geometry , 1997 .

[15] Elad Hazan,et al. Bandit Convex Optimization: Towards Tight Bounds , 2014, NIPS.

[16] Sham M. Kakade,et al. Towards Minimax Policies for Online Linear Optimization with Bandit Feedback , 2012, COLT.

[17] Katya Scheinberg,et al. Introduction to derivative-free optimization , 2010, Math. Comput..

[18] Thomas P. Hayes,et al. The Price of Bandit Information for Online Optimization , 2007, NIPS.

[19] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[20] K. Ball. An elementary introduction to modern convex geometry, in flavors of geometry , 1997 .

[21] Sham M. Kakade,et al. Stochastic Convex Optimization with Bandit Feedback , 2011, SIAM J. Optim..

[22] Ronen Eldan,et al. Bandit Smooth Convex Optimization: Improving the Bias-Variance Tradeoff , 2015, NIPS.

[23] Sébastien Bubeck,et al. Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[24] N. Biggs. GEOMETRIC ALGORITHMS AND COMBINATORIAL OPTIMIZATION: (Algorithms and Combinatorics 2) , 1990 .

[25] Elad Hazan,et al. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[26] Lin Xiao,et al. Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[27] Robert D. Kleinberg. Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.