On the value of learning for Bernoulli bandits with unknown parameters
暂无分享,去创建一个
[1] R. Bellman. A PROBLEM IN THE SEQUENTIAL DESIGN OF EXPERIMENTS , 1954 .
[2] M. Degroot. Optimal Statistical Decisions , 1970 .
[3] U. Rieder. Bayesian dynamic programming , 1975, Advances in Applied Probability.
[4] P. Kumar,et al. On the optimal solution of the one-armed bandit adaptive control problem , 1981 .
[5] Sheldon M. Ross,et al. Stochastic Processes , 2018, Gauge Integral Structures for Stochastic Calculus and Quantum Electrodynamics.
[6] Patchigolla Kiran Kumar,et al. A Survey of Some Results in Stochastic Adaptive Control , 1985 .
[7] P. W. Jones,et al. Bandit Problems, Sequential Allocation of Experiments , 1987 .
[8] J. Bather,et al. Multi‐Armed Bandit Allocation Indices , 1990 .
[9] D. Berry,et al. Worth of perfect information in bernoulli bandits , 1991, Advances in Applied Probability.
[10] J. Gittins,et al. The Learning Component of Dynamic Allocation Indices , 1992 .
[11] Martin L. Puterman,et al. Markov Decision Processes: Discrete Stochastic Dynamic Programming , 1994 .