论文信息 - Reinforcement Learning by Stochastic Hill Climbing on Discounted Reward

Reinforcement Learning by Stochastic Hill Climbing on Discounted Reward

Abstract Reinforcement learning systems are often required to find not deterministic policies, but stochastic ones. They are also required to gain more reward while learning. Q-learning has not been designed for stochastic policies, and does not guarantee rational behavior on the halfway of learning. This paper presents a new reinforcement learning approach based on a simple credit-assignment for finding memory-less policies. It satisfies the above requirements with considering the policy and the exploration strategy identically. The mathematical analysis shows the proposed method is a stochastic gradient ascent on discounted reward in Markov decision processes (MDPs), and is related to the average-reward framework. The analysis assures that the proposed method can be extended to continuous environments. We also investigate its behavior in comparison with Q-learning on a small MDP example and a non-Markovian one.

[1] Kumpati S. Narendra,et al. Learning Automata - A Survey , 1974, IEEE Trans. Syst. Man Cybern..

[2] L. C. Baird,et al. Reinforcement learning in continuous time: advantage updating , 1994, Proceedings of 1994 IEEE International Conference on Neural Networks (ICNN'94).

[3] Long-Ji Lin,et al. Self-improving reactive agents: case studies of reinforcement learning frameworks , 1991 .

[4] Richard Wheeler,et al. Decentralized learning in finite Markov chains , 1985, 1985 24th IEEE Conference on Decision and Control.

[5] Leslie Pack Kaelbling,et al. Acting Optimally in Partially Observable Stochastic Domains , 1994, AAAI.

[6] Andrew McCallum,et al. Overcoming Incomplete Perception with Utile Distinction Memory , 1993, ICML.

[7] Anton Schwartz,et al. A Reinforcement Learning Method for Maximizing Undiscounted Rewards , 1993, ICML.

[8] Michael L. Littman,et al. Markov Games as a Framework for Multi-Agent Reinforcement Learning , 1994, ICML.

[9] Michael I. Jordan,et al. Reinforcement Learning Algorithm for Partially Observable Markov Decision Problems , 1994, NIPS.

[10] Long Ji Lin,et al. Scaling Up Reinforcement Learning for Robot Control , 1993, International Conference on Machine Learning.

[11] Lonnie Chrisman,et al. Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.