论文信息 - A new Q-learning algorithm based on the metropolis criterion

A new Q-learning algorithm based on the metropolis criterion

The balance between exploration and exploitation is one of the key problems of action selection in Q-learning. Pure exploitation causes the agent to reach the locally optimal policies quickly, whereas excessive exploration degrades the performance of the Q-learning algorithm even if it may accelerate the learning process and allow avoiding the locally optimal policies. In this paper, finding the optimum policy in Q-learning is described as search for the optimum solution in combinatorial optimization. The Metropolis criterion of simulated annealing algorithm is introduced in order to balance exploration and exploitation of Q-learning, and the modified Q-learning algorithm based on this criterion, SA-Q-learning, is presented. Experiments show that SA-Q-learning converges more quickly than Q-learning or Boltzmann exploration, and that the search does not suffer of performance degradation due to excessive exploration.

[1] N. Metropolis,et al. Equation of State Calculations by Fast Computing Machines , 1953, Resonance.

[2] Peter Dayan,et al. Q-learning , 1992, Machine Learning.

[3] Doina Precup,et al. Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning , 1999, Artif. Intell..

[4] A. Percus,et al. Nature's Way of Optimizing , 1999, Artif. Intell..

[5] Chris Watkins,et al. Learning from delayed rewards , 1989 .

[6] Richard S. Sutton,et al. Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[7] C. D. Gelatt,et al. Optimization by Simulated Annealing , 1983, Science.

[8] Richard S. Sutton,et al. Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[9] Andrew W. Moore,et al. Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[10] Minoru Asada,et al. Cooperative Behavior Acquisition for Mobile Robots in Dynamically Changing Real Worlds Via Vision-Based Reinforcement Learning and Development , 1999, Artif. Intell..

[11] Thomas G. Dietterich,et al. Explanation-Based Learning and Reinforcement Learning: A Unified View , 1995, Machine-mediated learning.