Adaptive epsilon-Greedy Exploration in Reinforcement Learning Based on Value Difference
暂无分享,去创建一个
[1] Warren B. Powell,et al. Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming , 2006, Machine Learning.
[2] Baruch Awerbuch,et al. Adaptive routing with end-to-end feedback: distributed learning and geometric approaches , 2004, STOC '04.
[3] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[4] Mehryar Mohri,et al. Multi-armed Bandit Algorithms and Empirical Evaluation , 2005, ECML.
[5] Junichiro Yoshimoto,et al. Control of exploitation-exploration meta-parameter in reinforcement learning , 2002, Neural Networks.
[6] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[7] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[8] Sebastian Thrun,et al. Efficient Exploration In Reinforcement Learning , 1992 .
[9] Gianluca Bontempi,et al. Improving the Exploration Strategy in Bandit Algorithms , 2008, LION.
[10] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[11] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[12] Ronen I. Brafman,et al. R-MAX - A General Polynomial Time Algorithm for Near-Optimal Reinforcement Learning , 2001, J. Mach. Learn. Res..
[13] Rina Azoulay-Schwartz,et al. Exploitation vs. exploration: choosing a supplier in an environment of incomplete information , 2004, Decis. Support Syst..
[14] Dimitri P. Bertsekas,et al. Dynamic Programming: Deterministic and Stochastic Models , 1987 .