In an online linear optimization problem, on each period $t$, an online algorithm chooses $s_t\in\mathcal{S}$ from a fixed (possibly infinite) set $\mathcal{S}$ of feasible decisions. Nature (who may be adversarial) chooses a weight vector $w_t\in\mathbb{R}^n$, and the algorithm incurs cost $c(s_t,w_t)$, where $c$ is a fixed cost function that is linear in the weight vector. In the full-information setting, the vector $w_t$ is then revealed to the algorithm, and in the bandit setting, only the cost experienced, $c(s_t,w_t)$, is revealed. The goal of the online algorithm is to perform nearly as well as the best fixed $s\in\mathcal{S}$ in hindsight. Many repeated decision-making problems with weights fit naturally into this framework, such as online shortest-path, online traveling salesman problem (TSP), online clustering, and online weighted set cover. Previously, it was shown how to convert any efficient exact offline optimization algorithm for such a problem into an efficient online algorithm in both the full-information and the bandit settings, with average cost nearly as good as that of the best fixed $s\in\mathcal{S}$ in hindsight. However, in the case where the offline algorithm is an approximation algorithm with ratio $\alpha >1$, the previous approach worked only for special types of approximation algorithms. We show how to convert any offline approximation algorithm for a linear optimization problem into a corresponding online approximation algorithm, with a polynomial blowup in runtime. If the offline algorithm has an $\alpha$-approximation guarantee, then the expected cost of the online algorithm on any sequence is not much larger than $\alpha$ times that of the best $s\in\mathcal{S}$, where the best is chosen with the benefit of hindsight. Our main innovation is combining Zinkevich's algorithm for convex optimization with a geometric transformation that can be applied to any approximation algorithm. Standard techniques generalize the above result to the bandit setting, except that a “barycentric spanner” for the problem is also (provably) necessary as input. Our algorithm can also be viewed as a method for playing large repeated games, where one can compute only approximate best responses, rather than best responses.
[1]
David P. Williamson,et al.
Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming
,
1995,
JACM.
[2]
Robert D. Carr,et al.
Randomized metarounding
,
2002,
Random Struct. Algorithms.
[3]
Martin Zinkevich,et al.
Online Convex Programming and Generalized Infinitesimal Gradient Ascent
,
2003,
ICML.
[4]
Baruch Awerbuch,et al.
Adaptive routing with end-to-end feedback: distributed learning and geometric approaches
,
2004,
STOC '04.
[5]
Avrim Blum,et al.
Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary
,
2004,
COLT.
[6]
Adam Tauman Kalai,et al.
Online convex optimization in the bandit setting: gradient descent without a gradient
,
2004,
SODA '05.
[7]
Santosh S. Vempala,et al.
Efficient algorithms for online decision problems
,
2005,
Journal of computer and system sciences (Print).
[8]
Aranyak Mehta,et al.
Design is as Easy as Optimization
,
2006,
SIAM J. Discret. Math..
[9]
Thomas P. Hayes,et al.
Robbing the bandit: less regret in online geometric optimization against an adaptive adversary
,
2006,
SODA '06.
[10]
Maria-Florina Balcan,et al.
Approximation Algorithms and Online Mechanisms for Item Pricing
,
2007,
Theory Comput..
[11]
H. Robbins.
Some aspects of the sequential design of experiments
,
1952
.
[12]
Thomas P. Hayes,et al.
The Price of Bandit Information for Online Optimization
,
2007,
NIPS.
[13]
Santosh S. Vempala,et al.
A simple polynomial-time rescaling algorithm for solving linear programs
,
2008,
Math. Program..
[14]
Elad Hazan,et al.
Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization
,
2008,
COLT.