论文信息 - Playing Games with Approximation Algorithms

Playing Games with Approximation Algorithms

In an online linear optimization problem, on each period $t$, an online algorithm chooses $s_t\in\mathcal{S}$ from a fixed (possibly infinite) set $\mathcal{S}$ of feasible decisions. Nature (who may be adversarial) chooses a weight vector $w_t\in\mathbb{R}^n$, and the algorithm incurs cost $c(s_t,w_t)$, where $c$ is a fixed cost function that is linear in the weight vector. In the full-information setting, the vector $w_t$ is then revealed to the algorithm, and in the bandit setting, only the cost experienced, $c(s_t,w_t)$, is revealed. The goal of the online algorithm is to perform nearly as well as the best fixed $s\in\mathcal{S}$ in hindsight. Many repeated decision-making problems with weights fit naturally into this framework, such as online shortest-path, online traveling salesman problem (TSP), online clustering, and online weighted set cover. Previously, it was shown how to convert any efficient exact offline optimization algorithm for such a problem into an efficient online algorithm in both the full-information and the bandit settings, with average cost nearly as good as that of the best fixed $s\in\mathcal{S}$ in hindsight. However, in the case where the offline algorithm is an approximation algorithm with ratio $\alpha >1$, the previous approach worked only for special types of approximation algorithms. We show how to convert any offline approximation algorithm for a linear optimization problem into a corresponding online approximation algorithm, with a polynomial blowup in runtime. If the offline algorithm has an $\alpha$-approximation guarantee, then the expected cost of the online algorithm on any sequence is not much larger than $\alpha$ times that of the best $s\in\mathcal{S}$, where the best is chosen with the benefit of hindsight. Our main innovation is combining Zinkevich's algorithm for convex optimization with a geometric transformation that can be applied to any approximation algorithm. Standard techniques generalize the above result to the bandit setting, except that a “barycentric spanner” for the problem is also (provably) necessary as input. Our algorithm can also be viewed as a method for playing large repeated games, where one can compute only approximate best responses, rather than best responses.

[1] David P. Williamson,et al. Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming , 1995, JACM.

[2] Robert D. Carr,et al. Randomized metarounding , 2002, Random Struct. Algorithms.

[3] Martin Zinkevich,et al. Online Convex Programming and Generalized Infinitesimal Gradient Ascent , 2003, ICML.

[4] Baruch Awerbuch,et al. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.

[5] Avrim Blum,et al. Online Geometric Optimization in the Bandit Setting Against an Adaptive Adversary , 2004, COLT.

[6] Adam Tauman Kalai,et al. Online convex optimization in the bandit setting: gradient descent without a gradient , 2004, SODA '05.

[7] Santosh S. Vempala,et al. Efficient algorithms for online decision problems , 2005, Journal of computer and system sciences (Print).

[8] Aranyak Mehta,et al. Design is as Easy as Optimization , 2006, SIAM J. Discret. Math..

[9] Thomas P. Hayes,et al. Robbing the bandit: less regret in online geometric optimization against an adaptive adversary , 2006, SODA '06.

[10] Maria-Florina Balcan,et al. Approximation Algorithms and Online Mechanisms for Item Pricing , 2007, Theory Comput..

[11] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[12] Thomas P. Hayes,et al. The Price of Bandit Information for Online Optimization , 2007, NIPS.

[13] Santosh S. Vempala,et al. A simple polynomial-time rescaling algorithm for solving linear programs , 2008, Math. Program..

[14] Elad Hazan,et al. Competing in the Dark: An Efficient Algorithm for Bandit Linear Optimization , 2008, COLT.