Generalized Bandit Problems

This chapter examines a number of extensions of the multi-armed bandit framework. We consider the possibility of an infinite number of available arms, we give conditions under which the Gittins index strategy is well-defined, and we examine the optimality of that strategy. We then consider some difficulties arising from “parallel search,” in which a decision-maker may pull more than one arm per period, and from the introduction of a cost of switching between arms.

[1]  J. Gittins Bandit processes and dynamic allocation indices , 1979 .

[2]  P. Whittle Arm-Acquiring Bandits , 1981 .

[3]  D. Blackwell Discounted Dynamic Programming , 1965 .

[4]  Donald A. Berry,et al.  Bandit Problems: Sequential Allocation of Experiments. , 1986 .

[5]  U. Rieder Bayesian dynamic programming , 1975, Advances in Applied Probability.

[6]  Michael Kolonko,et al.  The Sequential Design of Bernoulli Experiments Including Switching Costs , 1985, Oper. Res..

[7]  J. Banks,et al.  Switching Costs and the Gittins Index , 1994 .

[8]  J. Gani,et al.  Progress in statistics , 1975 .

[9]  Michael Spagat,et al.  Optimal learning with costly adjustment , 1995 .

[10]  H. Robbins,et al.  Asymptotically efficient adaptive allocation rules , 1985 .

[11]  W. Viscusi,et al.  Job Hazards and Worker Quit Rates: An Analysis of Adaptive Worker Behavior , 1979 .

[12]  M. Rothschild A two-armed bandit theory of market pricing , 1974 .

[13]  R. Hartley,et al.  Optimisation Over Time: Dynamic Programming and Stochastic Control: , 1983 .

[14]  Dorian Feldman Contributions to the "Two-Armed Bandit" Problem , 1962 .

[15]  M. Weitzman Optimal search for the best alternative , 1978 .

[16]  Dale T. Mortensen,et al.  Chapter 15 Job search and labor market analysis , 1986 .

[17]  J. Banks,et al.  Denumerable-Armed Bandits , 1992 .

[18]  Rangarajan K. Sundaram,et al.  A class of bandit problems yielding myopic optimal strategies , 1992, Journal of Applied Probability.

[19]  D. Teneketzis,et al.  Asymptotically efficient adaptive allocation rules for the multiarmed bandit problem with switching cost , 1988 .

[20]  M. Schal On Dynamic Programming and Statistical Decision Theory , 1979 .