论文信息 - A Simple Distribution-Free Approach to the Max k-Armed Bandit Problem

A Simple Distribution-Free Approach to the Max k-Armed Bandit Problem

The max k-armed bandit problem is a recently-introduced online optimization problem with practical applications to heuristic search. Given a set of k slot machines, each yielding payoff from a fixed (but unknown) distribution, we wish to allocate trials to the machines so as to maximize the maximum payoff received over a series of n trials. Previous work on the max k-armed bandit problem has assumed that payoffs are drawn from generalized extreme value (GEV) distributions. In this paper we present a simple algorithm, based on an algorithm for the classical k-armed bandit problem, that solves the max k-armed bandit problem effectively without making strong distributional assumptions. We demonstrate the effectiveness of our approach by applying it to the task of selecting among priority dispatching rules for the resource-constrained project scheduling problem with maximal time lags (RCPSP/max).

Stephen F. Smith | Matthew J. Streeter | Stephen F. Smith

[1] Eric P. Smith,et al. An Introduction to Statistical Modeling of Extreme Values , 2002, Technometrics.

[2] Rolf H. Möhring,et al. Solving Project Scheduling Problems by Minimum Cut Computations , 2002, Manag. Sci..

[3] P. W. Jones,et al. Bandit Problems, Sequential Allocation of Experiments , 1987 .

[4] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .

[5] Stephen F. Smith,et al. Heuristic Selection for Stochastic Search Optimization: Modeling Solution Quality by Extreme Value Theory , 2004, CP.

[6] H. Robbins. Some aspects of the sequential design of experiments , 1952 .

[7] Stephen F. Smith,et al. The Max K-Armed Bandit: A New Model of Exploration Applied to Search Heuristic Selection , 2005, AAAI.

[8] Stephen F. Smith,et al. An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem , 2006, AAAI.

[9] Professor Dr. Klaus Neumann,et al. Project Scheduling with Time Windows and Scarce Resources , 2003, Springer Berlin Heidelberg.

[10] Peter Auer,et al. Finite-time Analysis of the Multiarmed Bandit Problem , 2002, Machine Learning.

[11] Peter Auer,et al. The Nonstochastic Multiarmed Bandit Problem , 2002, SIAM J. Comput..

[12] T. Lai. Adaptive treatment allocation and the multi-armed bandit problem , 1987 .

[13] Christoph Schwindt,et al. Generation of Resource-Constrained Project Scheduling Problems with Minimal and Maximal Time Lags , 1998 .