An optimal algorithm for bandit convex optimization

We consider the problem of online convex optimization against an arbitrary adversary with bandit feedback, known as bandit convex optimization. We give the first $\tilde{O}(\sqrt{T})$-regret algorithm for this setting based on a novel application of the ellipsoid method to online learning. This bound is known to be tight up to logarithmic factors. Our analysis introduces new tools in discrete convex geometry.

[1]  Ambuj Tewari,et al.  Improved Regret Guarantees for Online Smooth Convex Optimization with Bandit Feedback , 2011, AISTATS.

[2]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[3]  Baruch Awerbuch,et al.  Online linear optimization and adaptive routing , 2008, J. Comput. Syst. Sci..

[4]  L. Lovász,et al.  Geometric Algorithms and Combinatorial Optimization , 1981 .

[5]  Yuval Peres,et al.  Bandit Convex Optimization: \(\sqrt{T}\) Regret in One Dimension , 2015, COLT.

[6]  Sébastien Bubeck,et al.  Multi-scale exploration of convex functions and bandit convex optimization , 2015, COLT.

[7]  Elad Hazan,et al.  Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[8]  John L. Nazareth,et al.  Introduction to derivative-free optimization , 2010, Math. Comput..

[9]  Ohad Shamir,et al.  On the Complexity of Bandit and Derivative-Free Stochastic Convex Optimization , 2012, COLT.

[10]  Hariharan Narayanan,et al.  Random Walk Approach to Regret Minimization , 2010, NIPS.

[11]  F. John Extremum Problems with Inequalities as Subsidiary Conditions , 2014 .

[12]  Elad Hazan,et al.  Hard-Margin Active Linear Regression , 2014, ICML.

[13]  Peter W. Glynn,et al.  Consistency of Multidimensional Convex Regression , 2012, Oper. Res..

[14]  K. Ball An Elementary Introduction to Modern Convex Geometry , 1997 .

[15]  Elad Hazan,et al.  Bandit Convex Optimization: Towards Tight Bounds , 2014, NIPS.

[16]  Sham M. Kakade,et al.  Towards Minimax Policies for Online Linear Optimization with Bandit Feedback , 2012, COLT.

[17]  Katya Scheinberg,et al.  Introduction to derivative-free optimization , 2010, Math. Comput..

[18]  Thomas P. Hayes,et al.  The Price of Bandit Information for Online Optimization , 2007, NIPS.

[19]  Peter Auer,et al.  The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[20]  K. Ball An elementary introduction to modern convex geometry, in flavors of geometry , 1997 .

[21]  Sham M. Kakade,et al.  Stochastic Convex Optimization with Bandit Feedback , 2011, SIAM J. Optim..

[22]  Ronen Eldan,et al.  Bandit Smooth Convex Optimization: Improving the Bias-Variance Tradeoff , 2015, NIPS.

[23]  Sébastien Bubeck,et al.  Regret Analysis of Stochastic and Nonstochastic Multi-armed Bandit Problems , 2012, Found. Trends Mach. Learn..

[24]  N. Biggs GEOMETRIC ALGORITHMS AND COMBINATORIAL OPTIMIZATION: (Algorithms and Combinatorics 2) , 1990 .

[25]  Elad Hazan,et al.  Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.

[26]  Lin Xiao,et al.  Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[27]  Robert D. Kleinberg Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.