Index policies for a class of discounted restless bandits

The paper concerns a class of discounted restless bandit problems which possess an indexability property. Conservation laws yield an expression for the reward suboptimality of a general policy. These results are utilised to study the closeness to optimality of an index policy for a special class of simple and natural dual speed restless bandits for which indexability is guaranteed. The strong performance of the index policy is confirmed by a computational study.

[1]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[2]  Jean Walrand,et al.  Extensions of the multiarmed bandit problem: The discounted case , 1985 .

[3]  P. Whittle Restless Bandits: Activity Allocation in a Changing World , 1988 .

[4]  Christian M. Ernst,et al.  Multi-armed Bandit Allocation Indices , 1989 .

[5]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[6]  R. Weber,et al.  Addendum to ‘On an index policy for restless bandits' , 1991, Advances in Applied Probability.

[7]  Dimitris Bertsimas,et al.  Conservation laws, extended polymatroids and multi-armed bandit problems: a unified approach to ind exable systems , 2011, IPCO.

[8]  John N. Tsitsiklis,et al.  The complexity of optimal queueing network control , 1994, Proceedings of IEEE 9th Annual Conference on Structure in Complexity Theory.

[9]  Lawrence M. Wein,et al.  Scheduling a Make-To-Stock Queue: Index Policies and Hedging Points , 1996, Oper. Res..

[10]  Dimitris Bertsimas,et al.  Conservation Laws, Extended Polymatroids and Multiarmed Bandit Problems; A Polyhedral Approach to Indexable Systems , 1996, Math. Oper. Res..

[11]  Jean-Arcady Meyer,et al.  Behaviors Coordination Using Restless Bandits Allocation Indexes , 1998 .

[12]  John N. Tsitsiklis,et al.  The Complexity of Optimal Queuing Network Control , 1999, Math. Oper. Res..

[13]  Kevin D. Glazebrook,et al.  Almost optimal policies for stochastic systemswhich almost satisfy conservation laws , 1999, Ann. Oper. Res..

[14]  José Niño Mora Restless Bandits, Partial Conservation Laws and Indexability , 2000 .

[15]  K. Glazebrook,et al.  Index-based policies for discounted multi-armed bandits on parallel machines , 2000 .

[16]  Kevin D. Glazebrook,et al.  Parallel Scheduling of Multiclass M/M/m Queues: Approximate and Heavy-Traffic Optimization of Achievable Performance , 2001, Oper. Res..