On the optimality of the Gittins index rule for multi-armed bandits with multiple plays

Abstract. We investigate the general multi-armed bandit problem with multiple servers. We determine a condition on the reward processes sufficient to guarantee the optimality of the strategy that operates at each instant of time the projects with the highest Gittins indices. We call this strategy the Gittins index rule for multi-armed bandits with multiple plays, or briefly the Gittins index rule. We show by examples that: (i) the aforementioned sufficient condition is not necessary for the optimality of the Gittins index rule; and (ii) when the sufficient condition is relaxed the Gittins index rule is not necessarily optimal. Finally, we present an application of the general results to the multiserver scheduling of parallel queues without arrivals.

[1]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[2]  Dimitris Bertsimas,et al.  Conservation laws, extended polymatroids and multi-armed bandit problems: a unified approach to ind exable systems , 2011, IPCO.

[3]  P. Whittle Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.

[4]  Dimitris Bertsimas,et al.  Conservation Laws, Extended Polymatroids and Multiarmed Bandit Problems; A Polyhedral Approach to Indexable Systems , 1996, Math. Oper. Res..

[5]  R. Weber On the Gittins Index for Multiarmed Bandits , 1992 .

[6]  R. Agrawal,et al.  Multi-armed bandit problems with multiple plays and switching cost , 1990 .

[7]  A. Mandelbaum Discrete multi-armed bandits and multi-parameter processes , 1986 .

[8]  P. Whittle Arm-Acquiring Bandits , 1981 .

[9]  Jean Walrand,et al.  Extensions of the multiarmed bandit problem: The discounted case , 1985 .

[10]  J. Walrand,et al.  Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .

[11]  Demosthenis Teneketzis,et al.  Multi-armed bandits with switching penalties , 1996, IEEE Trans. Autom. Control..

[12]  P. Whittle Restless Bandits: Activity Allocation in a Changing World , 1988 .

[13]  I. Karatzas Gittins Indices in the Dynamic Allocation Problem for Diffusion Processes , 1984 .

[14]  J. Neveu,et al.  Discrete Parameter Martingales , 1975 .

[15]  J. Tsitsiklis A lemma on the multiarmed bandit problem , 1986 .

[16]  K. Glazebrook On a sufficient condition for superprocesses due to whittle , 1982, Journal of Applied Probability.

[17]  F. Kelly Multi-Armed Bandits with Discount Factor Near One: The Bernoulli Case , 1981 .

[18]  D. Teneketzis,et al.  On the optimality of the Gittins index rule in multi-armed bandits with multiple plays , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[19]  P. Whittle Multi‐Armed Bandits and the Gittins Index , 1980 .