论文信息 - An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem

An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem

We present an asymptotically optimal algorithm for the max variant of the k-armed bandit problem. Given a set of k slot machines, each yielding payoff from a fixed (but unknown) distribution, we wish to allocate trials to the machines so as to maximize the expected maximum payoff received over a series of n trials. Subject to certain distributional assumptions, we show that O(k ln(k/δ) ln(n)2/e2) trials are sufficient to identify, with probability at least 1 - δ, a machine whose expected maximum payoff is within e of optimal. This result leads to a strategy for solving the problem that is asymptotically optimal in the following sense: the gap between the expected maximum payoff obtained by using our strategy for n trials and that obtained by pulling the single best arm for all n trials approaches zero as n → ∞.

Stephen F. Smith | Matthew J. Streeter | Stephen F. Smith

[1] P. W. Jones,et al. Bandit Problems, Sequential Allocation of Experiments , 1987 .

[2] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .

[3] Philip W. L. Fong. A Quantitative Study of Hypothesis Selection , 1995, ICML.

[4] Eric P. Smith,et al. An Introduction to Statistical Modeling of Extreme Values , 2002, Technometrics.

[5] Stephen F. Smith,et al. Heuristic Selection for Stochastic Search Optimization: Modeling Solution Quality by Extreme Value Theory , 2004, CP.

[6] Stephen F. Smith,et al. The Max K-Armed Bandit: A New Model of Exploration Applied to Search Heuristic Selection , 2005, AAAI.

[7] H. Robbins. Some aspects of the sequential design of experiments , 1952 .