Multi-armed bandits with switching costs

The multi-armed bandit problem with switching cost is investigated. It is shown that along optimal policies, decisions about the processor allocation need to be made only at stopping times that achieve an appropriate index (the well known "Gittins index" or a "switching index" that is defined for switching cost). Furthermore, sufficient conditions for optimality of allocation strategies, based on limited look-ahead techniques, are established. These conditions together with the above mentioned feature of optimal scheduling strategies simplify the search for an optimal allocation policy. Nevertheless, the determination of optimal allocation policies remains a difficult and challenging task.<<ETX>>

[1]  Jean Walrand,et al.  Extensions of the multiarmed bandit problem: The discounted case , 1985 .

[2]  D. Teneketzis,et al.  Optimality of index policies for stochastic scheduling with switching penalties , 1992, Journal of Applied Probability.

[3]  F. Kelly Multi-Armed Bandits with Discount Factor Near One: The Bernoulli Case , 1981 .

[4]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[5]  K. Glazebrook On a sufficient condition for superprocesses due to whittle , 1982, Journal of Applied Probability.

[6]  Kevin D. Glazebrook,et al.  Methods for the Evaluation of Permutations as Strategies in Stochastic Scheduling Problems , 1983 .

[7]  J. Tsitsiklis A lemma on the multiarmed bandit problem , 1986 .

[8]  D. Teneketzis,et al.  The multi-armed bandit problem with switching cost , 1987, 26th IEEE Conference on Decision and Control.

[9]  R. Agrawal,et al.  Asymptotically efficient adaptive allocation schemes for controlled Markov chains: finite parameter space , 1989 .

[10]  K. Glazebrook On the evaluation of fixed permutations as strategies in stochastic scheduling , 1982 .

[11]  Kevin D. Glazebrook On the evaluation of suboptimal strategies for families of alternative bandit processes , 1982 .

[12]  A. Mandelbaum Discrete multi-armed bandits and multi-parameter processes , 1986 .

[13]  K. Glazebrook On stochastic scheduling with precedence relations and switching costs , 1980, Journal of Applied Probability.

[14]  D. Teneketzis,et al.  Asymptotically Efficient Adaptive Allocation Schemes for Controlled I.I.D. Processes: Finite Paramet , 1988 .

[15]  I. Karatzas Gittins Indices in the Dynamic Allocation Problem for Diffusion Processes , 1984 .

[16]  Michael N. Katehakis,et al.  The Multi-Armed Bandit Problem: Decomposition and Computation , 1987, Math. Oper. Res..

[17]  P. Whittle Arm-Acquiring Bandits , 1981 .

[18]  P. Whittle Multi‐Armed Bandits and the Gittins Index , 1980 .

[19]  J. Bather,et al.  Multi‐Armed Bandit Allocation Indices , 1990 .

[20]  R. Agrawal,et al.  Multi-armed bandit problems with multiple plays and switching cost , 1990 .