We give an algorithm for the bandit version of a very general online optimization problem considered by Kalai and Vempala [1], for the case of an adaptive adversary. In this problem we are given a bounded set S ⊆ ℝ n of feasible points. At each time step t, the online algorithm must select a point x t ∈ S while simultaneously an adversary selects a cost vector C t ∈ ℝ n . The algorithm then incurs cost c t .x t . Kalai and Vempala show that even if S is exponentially large (or infinite), so long as we have an efficient algorithm for the offline problem (given c ∈ ℝ n , find x ∈ S to minimize c.x) and so long as the cost vectors are bounded, one can efficiently solve the online problem of performing nearly as well as the best fixed x∈ S in hindsight. The Kalai-Vempala algorithm assumes that the cost vectors c t are given to the algorithm after each time step. In the “bandit” version of the problem, the algorithm only observes its cost, c t .x t . Awerbuch and Kleinberg [2] give an algorithm for the bandit version for the case of an oblivious adversary, and an algorithm that works against an adaptive adversary for the special case of the shortest path problem. They leave open the problem of handling an adaptive adversary in the general case. In this paper, we solve this open problem, giving a simple online algorithm for the bandit problem in the general case in the presence of an adaptive adversary. Ignoring a (polynomial) dependence on n, we achieve a regret bound of \(\mathcal{O}(T^{\frac{3}{4}}\sqrt{ln(T)}))\).
[1]
D. Blackwell.
An analog of the minimax theorem for vector payoffs.
,
1956
.
[2]
Nicolò Cesa-Bianchi,et al.
Gambling in a rigged casino: The adversarial multi-armed bandit problem
,
1995,
Proceedings of IEEE 36th Annual Foundations of Computer Science.
[3]
Patrick Brézillon,et al.
Lecture Notes in Artificial Intelligence
,
1999
.
[4]
Russ Bubley,et al.
Randomized algorithms
,
1995,
CSUR.
[5]
Peter Auer,et al.
The Nonstochastic Multiarmed Bandit Problem
,
2002,
SIAM J. Comput..
[6]
Manfred K. Warmuth,et al.
Path Kernels and Multiplicative Updates
,
2002,
J. Mach. Learn. Res..
[7]
Martin Zinkevich,et al.
Online Convex Programming and Generalized Infinitesimal Gradient Ascent
,
2003,
ICML.
[8]
Baruch Awerbuch,et al.
Adaptive routing with end-to-end feedback: distributed learning and geometric approaches
,
2004,
STOC '04.
[9]
Santosh S. Vempala,et al.
Efficient algorithms for online decision problems
,
2005,
Journal of computer and system sciences (Print).
[10]
Thomas P. Hayes,et al.
Robbing the bandit: less regret in online geometric optimization against an adaptive adversary
,
2006,
SODA '06.