论文信息 - Stochastic Convex Optimization with Bandit Feedback - 字舞流文

Stochastic Convex Optimization with Bandit Feedback

This paper addresses the problem of minimizing a convex, Lipschitz function f over a convex, compact set χ under a stochastic bandit feedback model. In this model, the algorithm is allowed to observe noisy realizations of the function value f(x) at any query point x ∈ χ. We demonstrate a generalization of the ellipsoid algorithm that incurs O(poly (d) √T) regret. Since any algorithm has regret at least Ω(√T) on this problem, our algorithm is optimal in terms of the scaling with T.

Sham M. Kakade | Dean P. Foster | Alekh Agarwal | Alexander Rakhlin | Daniel J. Hsu | S. Kakade | Dean Phillips Foster | Alekh Agarwal | A. Rakhlin

[1] Santosh S. Vempala,et al. Solving convex programs by random walks , 2004, JACM.

[2] Lin Xiao,et al. Optimal Algorithms for Online Convex Optimization with Multi-Point Bandit Feedback. , 2010, COLT 2010.

[3] J. Kiefer,et al. Stochastic Estimation of the Maximum of a Regression Function , 1952 .

[4] Henryk Wozniakowski,et al. Information-based complexity , 1987, Nature.

[5] Thomas P. Hayes,et al. Stochastic Linear Optimization under Bandit Feedback , 2008, COLT.

[6] Andreas Krause,et al. Information-Theoretic Regret Bounds for Gaussian Process Optimization in the Bandit Setting , 2009, IEEE Transactions on Information Theory.

[7] Peter Auer,et al. Improved Rates for the Stochastic Continuum-Armed Bandit Problem , 2007, COLT.

[8] V. V. Buldygin,et al. Sub-Gaussian random variables , 1980 .

[9] Csaba Szepesvári,et al. –armed Bandits , 2022 .

[10] Shie Mannor,et al. Unimodal Bandits , 2011, ICML.

[11] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[12] Eli Upfal,et al. Multi-Armed Bandits in Metric Spaces ∗ , 2008 .

[13] John Darzentas,et al. Problem Complexity and Method Efficiency in Optimization , 1983 .

[14] K. Ball. An Elementary Introduction to Modern Convex Geometry , 1997 .

[15] Katya Scheinberg,et al. Introduction to derivative-free optimization , 2010, Math. Comput..

[16] H. Woxniakowski. Information-Based Complexity , 1988 .

[17] Maxim Raginsky,et al. Information-Based Complexity, Feedback and Dynamics in Convex Programming , 2010, IEEE Transactions on Information Theory.

[18] Robert D. Kleinberg. Nearly Tight Bounds for the Continuum-Armed Bandit Problem , 2004, NIPS.

[19] Eric W. Cope,et al. Regret and Convergence Bounds for a Class of Continuum-Armed Bandit Problems , 2009, IEEE Transactions on Automatic Control.

[20] K. Ball. An elementary introduction to modern convex geometry, in flavors of geometry , 1997 .

[21] R. Agrawal. The Continuum-Armed Bandit Problem , 1995 .

[22] László Lovász,et al. Geometric algorithms and algorithmic geometry , 1990 .

[23] Michael J. Todd,et al. Modifications and implementation of the ellipsoid algorithm for linear programming , 1982, Math. Program..