论文信息 - Finite-Time Regret Bounds for the Multiarmed Bandit Problem - 字舞流文

Finite-Time Regret Bounds for the Multiarmed Bandit Problem

Nicolò Cesa-Bianchi | Paul Fischer | N. Cesa-Bianchi | P. Fischer

[1] G. Lugosi,et al. Minimax lower bounds for the two-armed bandit problem , 1997, Proceedings of the 36th IEEE Conference on Decision and Control.

[2] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[3] R. Agrawal. Sample mean based index policies by O(log n) regret for the multi-armed bandit problem , 1995, Advances in Applied Probability.

[4] Nicolò Cesa-Bianchi,et al. Gambling in a rigged casino: The adversarial multi-armed bandit problem , 1995, Proceedings of IEEE 36th Annual Foundations of Computer Science.

[5] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..

[6] Michael O. Duff,et al. Q-Learning for Bandit Problems , 1995, ICML.

[7] Andrew G. Barto,et al. Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[8] P. Varaiya,et al. Multi-Armed bandit problem revisited , 1994 .

[9] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .

[10] Bruce E. Hajek,et al. Cooling Schedules for Optimal Annealing , 1988, Math. Oper. Res..

[11] H. Robbins,et al. Asymptotically efficient adaptive allocation rules , 1985 .

[12] V. Cerný. Thermodynamical approach to the traveling salesman problem: An efficient simulation algorithm , 1985 .

[13] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.

[14] J. Neveu,et al. Discrete Parameter Martingales , 1975 .

[15] W. Hoeffding. Probability inequalities for sum of bounded random variables , 1963 .