论文信息 - The non-stochastic multi-armed bandit problem

The non-stochastic multi-armed bandit problem

In the multi-armed bandit problem, a gambler must decide which arm of non-identical slot machines to play in a sequence of trials so as to maximize his reward. This classical problem has received much attention because of the simple model it provides of the trade-off between exploration (trying out each arm to nd the best one) and exploitation (playing the arm believed to give the best payoff). Past solutions for the bandit problem have almost always relied on assumptions about the statistics of the slot machines. In this work, we make no statistical assumptions whatsoever about the nature of the process generating the payoffs of the slot machines. We give a solution to the bandit problem in which an adversary, rather than a well-behaved stochastic process, has complete control over the payoffs. In a sequence of plays, we prove that the per-round payoff of our algorithm approaches that of the best arm at the rate . We show by a matching lower bound that this is best possible. We also prove that our algorithm approaches the per-round payoff of any set of strategies at a similar rate: if the best strategy is chosen from a pool of strategies then our algorithm approaches the per-round payoff of the strategy at the rate . Finally, we apply our results to the problem of playing an unknown repeated matrix game. We show that our algorithm approaches the minimax payoff of the unknown game at the rate

Y. Freund | R. Schapire | P. Auer

[1] James Hannan,et al. 4. APPROXIMATION TO RAYES RISK IN REPEATED PLAY , 1958 .

[2] A. Banos. On Pseudo-Games , 1968 .

[3] J. Neveu,et al. Discrete Parameter Martingales , 1975 .

[4] N. Megiddo. On repeated games with incomplete information played by non-Bayesian players , 1980 .

[5] Christian M. Ernst,et al. Multi-armed Bandit Allocation Indices , 1989 .

[6] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .

[7] Vladimir Vovk,et al. Aggregating strategies , 1990, COLT '90.

[8] Thomas M. Cover,et al. Elements of Information Theory , 2005 .

[9] David Haussler,et al. How to use expert advice , 1993, STOC.

[10] Dean P. Foster,et al. A Randomization Rule for Selecting Forecasts , 1993, Oper. Res..

[11] Manfred K. Warmuth,et al. The Weighted Majority Algorithm , 1994, Inf. Comput..