Reinforcement learning of dynamic behavior by using recurrent neural networks

Reinforcement learning is a learning scheme for finding the optimal policy to control a system, based on a scalar signal representing a reward or a punishment. If the observation of the system by the controller is sufficiently rich to represent the internal state of the system, the controller can achieve the optimal policy simply by learning reactive behavior. However, if the state of the controlled system cannot be assessed completely using current sensory observations, the controller must learn a dynamic behavior to achieve the optimal policy.In this paper, we propose a dynamic controller scheme which utilizes memory to uncover hidden states by using information about past system outputs, and makes control decisions using memory. This scheme integrates Q-learning, as proposed by Watkins, and recurrent neural networks of several types. It performs favorably in simulations which involve a task with hidden states.

[1]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[2]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[3]  Andrew G. Barto,et al.  Learning to Act Using Real-Time Dynamic Programming , 1995, Artif. Intell..

[4]  Itsuki Noda,et al.  Neural Networks that Learn Symbolic and Structured Representation of Information , 1995 .

[5]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[6]  Dana H. Ballard,et al.  Active Perception and Reinforcement Learning , 1990, Neural Computation.

[7]  Dana H. Ballard,et al.  Learning to perceive and act by trial and error , 1991, Machine Learning.

[8]  Michael L. Littman,et al.  Memoryless policies: theoretical limitations and practical results , 1994 .

[9]  Peter Dayan,et al.  Technical Note: Q-Learning , 2004, Machine Learning.

[10]  P. Werbos,et al.  Beyond Regression : "New Tools for Prediction and Analysis in the Behavioral Sciences , 1974 .

[11]  Paul J. Webros A menu of designs for reinforcement learning over time , 1990 .

[12]  Ming Tan,et al.  Cost-Sensitive Reinforcement Learning for Adaptive Classification and Control , 1991, AAAI.

[13]  Long Ji Lin,et al.  Self-improving reactive agents based on reinforcement learning, planning and teaching , 1992, Machine Learning.

[14]  R. Andrew Hidden State and Reinforcement Learning with Instance-Based State Identification , 1996 .

[15]  Lonnie Chrisman,et al.  Reinforcement Learning with Perceptual Aliasing: The Perceptual Distinctions Approach , 1992, AAAI.