论文信息 - On the value of learning for Bernoulli bandits with unknown parameters

On the value of learning for Bernoulli bandits with unknown parameters

Investigates the multiarmed bandit problem, where each arm generates an infinite sequence of Bernoulli distributed rewards. The parameters of these Bernoulli distributions are unknown and initially assumed to be beta-distributed. Every time a bandit is selected, its beta-distribution is updated to new information in a Bayesian way. The objective is to maximize the long-term discounted rewards. We study the relationship between the necessity of acquiring additional information and the reward. This is done by considering two extreme situations, which occur when a bandit has been played N times: the situation where the decision maker stops learning and the situation where the decision maker acquires full information about that bandit. We show that the difference in reward between this lower and upper bound goes to zero as N grows large.

Sandjai Bhulai | Ger Koole | G. Koole | S. Bhulai

[1] R. Bellman. A PROBLEM IN THE SEQUENTIAL DESIGN OF EXPERIMENTS , 1954 .

[2] M. Degroot. Optimal Statistical Decisions , 1970 .

[3] U. Rieder. Bayesian dynamic programming , 1975, Advances in Applied Probability.

[4] P. Kumar,et al. On the optimal solution of the one-armed bandit adaptive control problem , 1981 .

[5] Sheldon M. Ross,et al. Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.

[6] Patchigolla Kiran Kumar,et al. A Survey of Some Results in Stochastic Adaptive Control , 1985 .

[7] P. W. Jones,et al. Bandit Problems, Sequential Allocation of Experiments , 1987 .

[8] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .

[9] D. Berry,et al. Worth of perfect information in bernoulli bandits , 1991, Advances in Applied Probability.

[10] J. Gittins,et al. The Learning Component of Dynamic Allocation Indices , 1992 .

[11] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .