A new Q-learning algorithm based on the metropolis criterion
暂无分享,去创建一个
[1] N. Metropolis,et al. Equation of State Calculations by Fast Computing Machines , 1953, Resonance.
[2] Peter Dayan,et al. Q-learning , 1992, Machine Learning.
[3] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..
[4] A. Percus,et al. Nature's Way of Optimizing , 1999, Artif. Intell..
[5] Chris Watkins,et al. Learning from delayed rewards , 1989 .
[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.
[7] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.
[8] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.
[9] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..
[10] Minoru Asada,et al. Cooperative Behavior Acquisition for Mobile Robots in Dynamically Changing Real Worlds Via Vision-Based Reinforcement Learning and Development , 1999, Artif. Intell..
[11] Thomas G. Dietterich,et al. Explanation-Based Learning and Reinforcement Learning: A Unified View , 1995, Machine-mediated learning.