BOOK: Storing Algorithm-Invariant Episodes for Deep Reinforcement Learning

We introduce a novel method to train agents of reinforcement learning (RL) by sharing knowledge in a way similar to the concept of using a book. The recorded information in the form of a book is the main means by which humans learn knowledge. Nevertheless, the conventional deep RL methods have mainly focused either on experiential learning where the agent learns through interactions with the environment from the start or on imitation learning that tries to mimic the teacher. Contrary to these, our proposed book learning shares key information among different agents in a book-like manner by delving into the following two characteristic features: (1) By defining the linguistic function, input states can be clustered semantically into a relatively small number of core clusters, which are forwarded to other RL agents in a prescribed manner. (2) By defining state priorities and the contents for recording, core experiences can be selected and stored in a small container. We call this container as ‘BOOK’. Our method learns hundreds to thousand times faster than the conventional methods by learning only a handful of core cluster information, which shows that deep RL agents can effectively learn through the shared knowledge from other agents.

[1]  Paul J. Werbos,et al.  Approximate dynamic programming for real-time control and neural modeling , 1992 .

[2]  David Silver,et al.  Deep Reinforcement Learning with Double Q-Learning , 2015, AAAI.

[3]  A. Harry Klopf,et al.  Advantage Updating Applied to a Differrential Game , 1994, NIPS.

[4]  Michael I. Jordan,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES , 1996 .

[5]  Mahesan Niranjan,et al.  On-line Q-learning using connectionist systems , 1994 .

[6]  J. Andrew Bagnell,et al.  Reinforcement and Imitation Learning via Interactive No-Regret Learning , 2014, ArXiv.

[7]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[8]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[9]  John N. Tsitsiklis,et al.  Neuro-dynamic programming: an overview , 1995, Proceedings of 1995 34th IEEE Conference on Decision and Control.

[10]  Pat Langley,et al.  Crafting Papers on Machine Learning , 2000, ICML.

[11]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[12]  John Langford,et al.  Learning to Search for Dependencies , 2015, ArXiv.

[13]  Hado van Hasselt,et al.  Double Q-learning , 2010, NIPS.

[14]  Stefan Schaal,et al.  Reinforcement Learning for Humanoid Robotics , 2003 .

[15]  Longxin Lin Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.

[16]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[17]  R. Bellman A Markovian Decision Process , 1957 .

[18]  Sergey Levine,et al.  Trust Region Policy Optimization , 2015, ICML.

[19]  Alex Graves,et al.  Playing Atari with Deep Reinforcement Learning , 2013, ArXiv.

[20]  Richard S. Sutton,et al.  Neuronlike adaptive elements that can solve difficult learning control problems , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[21]  John Langford,et al.  Learning to Search Better than Your Teacher , 2015, ICML.

[22]  Peter Dayan,et al.  Q-learning , 1992, Machine Learning.

[23]  Tom Schaul,et al.  Prioritized Experience Replay , 2015, ICLR.

[24]  Xi Chen,et al.  Evolution Strategies as a Scalable Alternative to Reinforcement Learning , 2017, ArXiv.

[25]  Alex Graves,et al.  Asynchronous Methods for Deep Reinforcement Learning , 2016, ICML.

[26]  Alborz Geramifard,et al.  RLPy: a value-function-based reinforcement learning framework for education and research , 2015, J. Mach. Learn. Res..

[27]  Wojciech Zaremba,et al.  OpenAI Gym , 2016, ArXiv.

[28]  Pieter Abbeel,et al.  An Application of Reinforcement Learning to Aerobatic Helicopter Flight , 2006, NIPS.

[29]  Demis Hassabis,et al.  Neural Episodic Control , 2017, ICML.

[30]  Tom Schaul,et al.  Dueling Network Architectures for Deep Reinforcement Learning , 2015, ICML.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Long-Ji Lin,et al.  Reinforcement learning for robots using neural networks , 1992 .