论文信息 - Multi-armed bandits with switching costs

Multi-armed bandits with switching costs

The multi-armed bandit problem with switching cost is investigated. It is shown that along optimal policies, decisions about the processor allocation need to be made only at stopping times that achieve an appropriate index (the well known "Gittins index" or a "switching index" that is defined for switching cost). Furthermore, sufficient conditions for optimality of allocation strategies, based on limited look-ahead techniques, are established. These conditions together with the above mentioned feature of optimal scheduling strategies simplify the search for an optimal allocation policy. Nevertheless, the determination of optimal allocation policies remains a difficult and challenging task.<<ETX>>

Demosthenis Teneketzis | M. Asawa

[1] Jean Walrand,et al. Extensions of the multiarmed bandit problem: The discounted case , 1985 .

[2] D. Teneketzis,et al. Optimality of index policies for stochastic scheduling with switching penalties , 1992, Journal of Applied Probability.

[3] F. Kelly. Multi-Armed Bandits with Discount Factor Near One: The Bernoulli Case , 1981 .

[4] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .

[5] K. Glazebrook. On a sufficient condition for superprocesses due to whittle , 1982, Journal of Applied Probability.

[6] Kevin D. Glazebrook,et al. Methods for the Evaluation of Permutations as Strategies in Stochastic Scheduling Problems , 1983 .

[7] J. Tsitsiklis. A lemma on the multiarmed bandit problem , 1986 .

[8] D. Teneketzis,et al. The multi-armed bandit problem with switching cost , 1987, 26th IEEE Conference on Decision and Control.

[9] R. Agrawal,et al. Asymptotically efficient adaptive allocation schemes for controlled Markov chains: finite parameter space , 1989 .

[10] K. Glazebrook. On the evaluation of fixed permutations as strategies in stochastic scheduling , 1982 .

[11] Kevin D. Glazebrook. On the evaluation of suboptimal strategies for families of alternative bandit processes , 1982 .