论文信息 - On the optimality of the Gittins index rule for multi-armed bandits with multiple plays - 字舞流文

On the optimality of the Gittins index rule for multi-armed bandits with multiple plays

Abstract. We investigate the general multi-armed bandit problem with multiple servers. We determine a condition on the reward processes sufficient to guarantee the optimality of the strategy that operates at each instant of time the projects with the highest Gittins indices. We call this strategy the Gittins index rule for multi-armed bandits with multiple plays, or briefly the Gittins index rule. We show by examples that: (i) the aforementioned sufficient condition is not necessary for the optimality of the Gittins index rule; and (ii) when the sufficient condition is relaxed the Gittins index rule is not necessarily optimal. Finally, we present an application of the general results to the multiserver scheduling of parallel queues without arrivals.

Demosthenis Teneketzis | Dimitrios G. Pandelis | D. Teneketzis | D. G. Pandelis

[1] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .

[2] Dimitris Bertsimas,et al. Conservation laws, extended polymatroids and multi-armed bandit problems: a unified approach to ind exable systems , 2011, IPCO.

[3] P. Whittle. Restless bandits: activity allocation in a changing world , 1988, Journal of Applied Probability.

[4] Dimitris Bertsimas,et al. Conservation Laws, Extended Polymatroids and Multiarmed Bandit Problems; A Polyhedral Approach to Indexable Systems , 1996, Math. Oper. Res..

[5] R. Weber. On the Gittins Index for Multiarmed Bandits , 1992 .

[6] R. Agrawal,et al. Multi-armed bandit problems with multiple plays and switching cost , 1990 .

[7] A. Mandelbaum. Discrete multi-armed bandits and multi-parameter processes , 1986 .

[8] P. Whittle. Arm-Acquiring Bandits , 1981 .

[9] Jean Walrand,et al. Extensions of the multiarmed bandit problem: The discounted case , 1985 .

[10] J. Walrand,et al. Asymptotically efficient allocation rules for the multiarmed bandit problem with multiple plays-Part II: Markovian rewards , 1987 .

[11] Demosthenis Teneketzis,et al. Multi-armed bandits with switching penalties , 1996, IEEE Trans. Autom. Control..

[12] P. Whittle. Restless Bandits: Activity Allocation in a Changing World , 1988 .

[13] I. Karatzas. Gittins Indices in the Dynamic Allocation Problem for Diffusion Processes , 1984 .

[14] J. Neveu,et al. Discrete Parameter Martingales , 1975 .

[15] J. Tsitsiklis. A lemma on the multiarmed bandit problem , 1986 .

[16] K. Glazebrook. On a sufficient condition for superprocesses due to whittle , 1982, Journal of Applied Probability.

[17] F. Kelly. Multi-Armed Bandits with Discount Factor Near One: The Bernoulli Case , 1981 .

[18] D. Teneketzis,et al. On the optimality of the Gittins index rule in multi-armed bandits with multiple plays , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[19] P. Whittle. Multi‐Armed Bandits and the Gittins Index , 1980 .