论文信息 - Generalized Bandit Problems

Generalized Bandit Problems

This chapter examines a number of extensions of the multi-armed bandit framework. We consider the possibility of an infinite number of available arms, we give conditions under which the Gittins index strategy is well-defined, and we examine the optimality of that strategy. We then consider some difficulties arising from “parallel search,” in which a decision-maker may pull more than one arm per period, and from the introduction of a cost of switching between arms.

Rangarajan K. Sundaram

[1] J. Gittins. Bandit processes and dynamic allocation indices , 1979 .

[2] P. Whittle. Arm-Acquiring Bandits , 1981 .

[3] D. Blackwell. Discounted Dynamic Programming , 1965 .

[4] Donald A. Berry,et al. Bandit Problems: Sequential Allocation of Experiments. , 1986 .

[5] U. Rieder. Bayesian dynamic programming , 1975, Advances in Applied Probability.

[6] Michael Kolonko,et al. The Sequential Design of Bernoulli Experiments Including Switching Costs , 1985, Oper. Res..

[7] J. Banks,et al. Switching Costs and the Gittins Index , 1994 .

[8] J. Gani,et al. Progress in statistics , 1975 .

[9] Michael Spagat,et al. Optimal learning with costly adjustment , 1995 .

[10] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .

[11] W. Viscusi,et al. Job Hazards and Worker Quit Rates: An Analysis of Adaptive Worker Behavior , 1979 .