Playing Games with Approximation Algorithms

In an online linear optimization problem, on each period $t$, an online algorithm chooses $s_t\in\mathcal{S}$ from a fixed (possibly infinite) set $\mathcal{S}$ of feasible decisions. Nature (who may be adversarial) chooses a weight vector $w_t\in\mathbb{R}^n$, and the algorithm incurs cost $c(s_t,w_t)$, where $c$ is a fixed cost function that is linear in the weight vector. In the full-information setting, the vector $w_t$ is then revealed to the algorithm, and in the bandit setting, only the cost experienced, $c(s_t,w_t)$, is revealed. The goal of the online algorithm is to perform nearly as well as the best fixed $s\in\mathcal{S}$ in hindsight. Many repeated decision-making problems with weights fit naturally into this framework, such as online shortest-path, online traveling salesman problem (TSP), online clustering, and online weighted set cover. Previously, it was shown how to convert any efficient exact offline optimization algorithm for such a problem into an efficient online algorithm in both the full-information and the bandit settings, with average cost nearly as good as that of the best fixed $s\in\mathcal{S}$ in hindsight. However, in the case where the offline algorithm is an approximation algorithm with ratio $\alpha >1$, the previous approach worked only for special types of approximation algorithms. We show how to convert any offline approximation algorithm for a linear optimization problem into a corresponding online approximation algorithm, with a polynomial blowup in runtime. If the offline algorithm has an $\alpha$-approximation guarantee, then the expected cost of the online algorithm on any sequence is not much larger than $\alpha$ times that of the best $s\in\mathcal{S}$, where the best is chosen with the benefit of hindsight. Our main innovation is combining Zinkevich's algorithm for convex optimization with a geometric transformation that can be applied to any approximation algorithm. Standard techniques generalize the above result to the bandit setting, except that a “barycentric spanner” for the problem is also (provably) necessary as input. Our algorithm can also be viewed as a method for playing large repeated games, where one can compute only approximate best responses, rather than best responses.

[1]  David P. Williamson,et al.  Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[2]  Robert D. Carr,et al.  Randomized metarounding , 2002, Random Struct. Algorithms.

[3]  Martin Zinkevich,et al.  Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[4]  Baruch Awerbuch,et al.  Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.

[5]  Avrim Blum,et al.  Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary , 2004, COLT.

[6]  Adam Tauman Kalai,et al.  Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[7]  Santosh S. Vempala,et al.  Efficient algorithms for online decision problems , 2005, Journal of computer and system sciences (Print).

[8]  Aranyak Mehta,et al.  Design is as Easy as Optimization , 2006, SIAM J. Discret. Math..

[9]  Thomas P. Hayes,et al.  Robbing the bandit: less regret in online geometric optimization against an adaptive adversary , 2006, SODA '06.

[10]  Maria-Florina Balcan,et al.  Approximation Algorithms and Online Mechanisms for Item Pricing , 2007, Theory Comput..

[11]  H. Robbins Some aspects of the sequential design of experiments , 1952 .

[12]  Thomas P. Hayes,et al.  The Price of Bandit Information for Online Optimization , 2007, NIPS.

[13]  Santosh S. Vempala,et al.  A simple polynomial-time rescaling algorithm for solving linear programs , 2008, Math. Program..

[14]  Elad Hazan,et al.  Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.