Value-Difference Based Exploration: Adaptive Control between Epsilon-Greedy and Softmax
暂无分享,去创建一个
[1] Leslie Pack Kaelbling,et al. Learning in embedded systems , 1993 .
[2] Mahesan Niranjan,et al. On-line Q-learning using connectionist systems , 1994 .
[3] Ben J. A. Kröse,et al. Learning from delayed rewards , 1995, Robotics Auton. Syst..
[4] Richard S. Sutton,et al. Introduction to Reinforcement Learning , 1998 .
[5] Marco Wiering,et al. Explorations in efficient reinforcement learning , 1999 .
[6] Peter Auer,et al. Using Confidence Bounds for Exploitation-Exploration Trade-offs , 2003, J. Mach. Learn. Res..
[7] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[8] Peter Dayan,et al. Technical Note: Q-Learning , 2004, Machine Learning.
[9] Mehryar Mohri,et al. Multi-armed Bandit Algorithms and Empirical Evaluation , 2005, ECML.
[10] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[11] Stefan Schaal,et al. Natural Actor-Critic , 2003, Neurocomputing.
[12] P. Dayan,et al. Cortical substrates for exploratory decisions in humans , 2006, Nature.
[13] Warren B. Powell,et al. Adaptive stepsizes for recursive estimation with applications in approximate dynamic programming , 2006, Machine Learning.
[14] H. Robbins. Some aspects of the sequential design of experiments , 1952 .
[15] Peter Auer,et al. Near-optimal Regret Bounds for Reinforcement Learning , 2008, J. Mach. Learn. Res..
[16] Verena Heidrich-Meisner. Interview with Richard S. Sutton , 2009, Künstliche Intell..
[17] Michel Tokic. Adaptive ε-greedy Exploration in Reinforcement Learning Based on Value Differences , 2010 .
[18] Tanja Schultz,et al. KI 2010: Advances in Artificial Intelligence , 2010, Lecture Notes in Computer Science.