We utilize and develop elements of the recent achievable region account of Gittins indexation by Bertsimas and Ni ˜ no-Mora to design indexbased policies for discounted multi-armed bandits on parallel machines. The policies analyzed have expected rewards which come within an O� α� quantity of optimality, where α> 0 is a discount rate. In the main, the policies make an initial once for all allocation of bandits to machines, with each machine then handling its own workload optimally. This allocation must take careful account of the index structure of the bandits. The corresponding limit policies are average-overtaking optimal. 1. Introduction. Ever since Gittins and Jones (1974) proved the classical result establishing the optimality of Gittins index policies for multi-armed bandits with discounted rewards earned over an infinite horizon, it has been widely believed that policies based on such indices will perform very well when the single machine/server of the Gittins and Jones model is replaced by a collection of identical machines/servers working in parallel. Exploration of such issues goes back to Glazebrook (1976). It has become clear that parallel machine stochastic scheduling problems are much less tractable in general than their single machine counterparts. See, for example, Weber (1982), Weber, Varaiya and Walrand (1986) and Weiss (1990, 1992). Weiss (1995) has recently given an account of index policies for a problem involving the scheduling of a batch of stochastic jobs on parallel machines with a linear holding cost objective. New approaches to the analysis of such problems have emerged from recent research on the so-called achievable region approach to stochastic optimization. This approach develops solutions to stochastic optimization problems by (1) characterising the space of all possible performances (the achievable region) of the system of interest, and (2) optimizing the overall system-wide objective over this space. Following foundational contributions by Coffman and Mitrani (1980) and Shanthikumar and Yao (1992), Bertsimas and Ni ˜ no-Mora (1996) took this approach decisively further forward and gave an account of Gittins indices from this perspective. Recently, Glazebrook and Garbe (1999) explained how the achievable region approach could be deployed to analyze Gittins index policies for systems in which the conditions sufficient to establish that such policies are fully op
[1]
D. Blackwell.
Discrete Dynamic Programming
,
1962
.
[2]
A. F. Veinott.
ON FINDING OPTIMAL POLICIES IN DISCRETE DYNAMIC PROGRAMMING WITH NO DISCOUNTING
,
1966
.
[3]
B. L. Miller,et al.
An Optimality Condition for Discrete Dynamic Programming with no Discounting
,
1968
.
[4]
Edward G. Coffman,et al.
A Characterization of Waiting Time Performance Realizable by Single-Server Queues
,
1980,
Oper. Res..
[5]
R. Weber.
Scheduling jobs by stochastic processing requirements on parallel machines to minimize makespan or flowtime
,
1982,
Journal of Applied Probability.
[6]
J. Walrand,et al.
Scheduling jobs with stochastically ordered processing times on parallel machines to minimize expected flowtime
,
1986,
Journal of Applied Probability.
[7]
G. Weiss,et al.
Approximation results in parallel machnies stochastic scheduling
,
1991
.
[8]
Gideon Weiss,et al.
Turnpike Optimality of Smith's Rule in Parallel Machines Stochastic Scheduling
,
1992,
Math. Oper. Res..
[9]
David D. Yao,et al.
Multiclass Queueing Systems: Polymatroidal Structure and Optimal Scheduling Control
,
1992,
Oper. Res..
[10]
Dimitris Bertsimas,et al.
Conservation laws, extended polymatroids and multi-armed bandit problems: a unified approach to ind exable systems
,
2011,
IPCO.
[11]
G. Weiss,et al.
On almost optimal priority rules for preemptive scheduling of stochastic jobs on parallel machines
,
1995,
Advances in Applied Probability.
[12]
J.M. Schopf,et al.
Stochastic Scheduling
,
1999,
ACM/IEEE SC 1999 Conference (SC'99).
[13]
Kevin D. Glazebrook,et al.
Almost optimal policies for stochastic systemswhich almost satisfy conservation laws
,
1999,
Ann. Oper. Res..