Multi-armed bandits with switching penalties

The multi-armed bandit problem with switching penalties (switching cost and switching delays) is investigated. It is shown that under an optimal policy, decisions about the processor allocation need to be made only at stopping times that achieve an appropriate index, the well-known "Gittins index" or a "switching index" that is defined for switching cost and switching delays. An algorithm for the computation of the "switching index" is presented. Furthermore, sufficient conditions for optimality of allocation strategies, based on limited look-ahead techniques, are established. These conditions together with the above-mentioned feature of optimal scheduling policies simplify the search for an optimal allocation policy. For a special class of multi-armed bandits (scheduling of parallel queues with switching penalties and no arrivals), it is shown that the aforementioned property of optimal policies is sufficient to determine an optimal allocation strategy. In general, the determination of optimal allocation policies remains a difficult and challenging task.

[1]  R. Agrawal,et al.  Multi-armed bandit problems with multiple plays and switching cost , 1990 .

[2]  D. Teneketzis,et al.  Optimality of index policies for stochastic scheduling with switching penalties , 1992, Journal of Applied Probability.

[3]  Jean Walrand,et al.  Extensions of the multiarmed bandit problem: The discounted case , 1985 .

[4]  A. Mandelbaum Discrete multi-armed bandits and multi-parameter processes , 1986 .

[5]  J. Tsitsiklis A lemma on the multiarmed bandit problem , 1986 .

[6]  R. Agrawal,et al.  Asymptotically efficient adaptive allocation schemes for controlled Markov chains: finite parameter space , 1989 .

[7]  K. Glazebrook On a sufficient condition for superprocesses due to whittle , 1982, Journal of Applied Probability.

[8]  J. Banks,et al.  Switching Costs and the Gittins Index , 1994 .

[9]  Kevin D. Glazebrook,et al.  Methods for the Evaluation of Permutations as Strategies in Stochastic Scheduling Problems , 1983 .

[10]  P. Whittle Arm-Acquiring Bandits , 1981 .

[11]  K. Glazebrook On stochastic scheduling with precedence relations and switching costs , 1980, Journal of Applied Probability.

[12]  D. Teneketzis,et al.  Asymptotically Efficient Adaptive Allocation Schemes for Controlled I.I.D. Processes: Finite Paramet , 1988 .

[13]  D. Teneketzis,et al.  Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching cost , 1988 .

[14]  M. Weitzman Optimal search for the best alternative , 1978 .

[15]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[16]  Michael N. Katehakis,et al.  COMPUTING OPTIMAL SEQUENTIAL ALLOCATION RULES IN CLINICAL TRIALS , 1986 .

[17]  J. Banks,et al.  Denumerable-Armed Bandits , 1992 .

[18]  I. Karatzas Gittins Indices in the Dynamic Allocation Problem for Diffusion Processes , 1984 .

[19]  Michael N. Katehakis,et al.  The Multi-Armed Bandit Problem: Decomposition and Computation , 1987, Math. Oper. Res..

[20]  F. Kelly Multi-Armed Bandits with Discount Factor Near One: The Bernoulli Case , 1981 .

[21]  P. Whittle Multi‐Armed Bandits and the Gittins Index , 1980 .

[22]  Kevin D. Glazebrook On the evaluation of suboptimal strategies for families of alternative bandit processes , 1982 .

[23]  R. Weber On the Gittins Index for Multiarmed Bandits , 1992 .

[24]  Dale T. Mortensen,et al.  Chapter 15 Job search and labor market analysis , 1986 .