Stochastic Convex Optimization with Bandit Feedback

This paper addresses the problem of minimizing a convex, Lipschitz function f over a convex, compact set χ under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value f(x) at any query point x ∈ χ. We demonstrate a generalization of the ellipsoid algorithm that incurs O(poly (d) √T) regret. Since any algorithm has regret at least Ω(√T) on this problem, our algorithm is optimal in terms of the scaling with T.

[1]  Santosh S. Vempala,et al.  Solving convex programs by random walks , 2004, JACM.

[2]  Lin Xiao,et al.  Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[3]  J. Kiefer,et al.  Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[4]  Henryk Wozniakowski,et al.  Information-based complexity , 1987, Nature.

[5]  Thomas P. Hayes,et al.  Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[6]  Andreas Krause,et al.  Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[7]  Peter Auer,et al.  Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.

[8]  V. V. Buldygin,et al.  Sub-Gaussian random variables , 1980 .

[9]  Csaba Szepesvári,et al.  –armed Bandits , 2022 .

[10]  Shie Mannor,et al.  Unimodal Bandits , 2011, ICML.

[11]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[12]  Eli Upfal,et al.  Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[13]  John Darzentas,et al.  Problem Complexity and Method Efficiency in Optimization , 1983 .

[14]  K. Ball An Elementary Introduction to Modern Convex Geometry , 1997 .

[15]  Katya Scheinberg,et al.  Introduction to derivative-free optimization , 2010, Math. Comput..

[16]  H. Woxniakowski Information-Based Complexity , 1988 .

[17]  Maxim Raginsky,et al.  Information-Based Complexity, Feedback and Dynamics in Convex Programming , 2010, IEEE Transactions on Information Theory.

[18]  Robert D. Kleinberg Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[19]  Eric W. Cope,et al.  Regret and Convergence Bounds for a Class of Continuum-Armed Bandit Problems , 2009, IEEE Transactions on Automatic Control.

[20]  K. Ball An elementary introduction to modern convex geometry, in flavors of geometry , 1997 .

[21]  R. Agrawal The Continuum-Armed Bandit Problem , 1995 .

[22]  László Lovász,et al.  Geometric algorithms and algorithmic geometry , 1990 .

[23]  Michael J. Todd,et al.  Modifications and implementation of the ellipsoid algorithm for linear programming , 1982, Math. Program..