Solving Partially Observable Reinforcement Learning Problems with Recurrent Neural Networks

The aim of this chapter is to provide a series of tricks and recipes for neural state estimation, particularly for real world applications of reinforcement learning. We use various topologies of recurrent neural networks as they allow to identify the continuous valued, possibly high dimensional state space of complex dynamical systems. Recurrent neural networks explicitly offer possibilities to account for time and memory, in principle they are able to model any type of dynamical system. Because of these capabilities recurrent neural networks are a suitable tool to approximate a Markovian state space of dynamical systems. In a second step, reinforcement learning methods can be applied to solve a defined control problem. Besides the trick of using a recurrent neural network for state estimation, various issues regarding real world problems such as, large sets of observables and long-term dependencies are addressed.

[1]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[2]  R. Bellman Dynamic programming. , 1957, Science.

[3]  F. Takens Detecting strange attractors in turbulence , 1981 .

[4]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[5]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[6]  Geoffrey E. Hinton,et al.  Learning representations by back-propagation errors, nature , 1986 .

[7]  Charles W. Anderson,et al.  Strategy Learning with Multilayer Connectionist Representations , 1987 .

[8]  Jirí Benes,et al.  On neural networks , 1990, Kybernetika.

[9]  A. P. Wieland,et al.  Evolving neural network controllers for unstable systems , 1991, IJCNN-91-Seattle International Joint Conference on Neural Networks.

[10]  Michael C. Mozer,et al.  Induction of Multiscale Temporal Structure , 1991, NIPS.

[11]  Giovanni Soda,et al.  Local Feedback Multilayered Networks , 1992, Neural Computation.

[12]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[13]  Ryszard Tadeusiewicz,et al.  Neural networks: A comprehensive foundation: by Simon HAYKIN; Macmillan College Publishing, New York, USA; IEEE Press, New York, USA; IEEE Computer Society Press, Los Alamitos, CA, USA; 1994; 696 pp.; $69–95; ISBN: 0-02-352761-7 , 1995 .

[14]  John N. Tsitsiklis,et al.  Neuro-Dynamic Programming , 1996, Encyclopedia of Machine Learning.

[15]  Ralph Neuneier,et al.  How to Train Neural Networks , 1996, Neural Networks: Tricks of the Trade.

[16]  Peter Tiño,et al.  Learning long-term dependencies in NARX recurrent neural networks , 1996, IEEE Trans. Neural Networks.

[17]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[18]  Risto Miikkulainen,et al.  2-D Pole Balancing with Recurrent Evolutionary Networks , 1998 .

[19]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[20]  Alex M. Andrew,et al.  ROBOT LEARNING, edited by Jonathan H. Connell and Sridhar Mahadevan, Kluwer, Boston, 1993/1997, xii+240 pp., ISBN 0-7923-9365-1 (Hardback, 218.00 Guilders, $120.00, £89.95). , 1999, Robotica (Cambridge. Print).

[21]  Lakhmi C. Jain,et al.  Recurrent Neural Networks: Design and Applications , 1999 .

[22]  Kee-Eung Kim,et al.  Learning Finite-State Controllers for Partially Observable Environments , 1999, UAI.

[23]  Arthur L. Samuel,et al.  Some studies in machine learning using the game of checkers , 2000, IBM J. Res. Dev..

[24]  Bram Bakker,et al.  Reinforcement Learning with Long Short-Term Memory , 2001, NIPS.

[25]  John F. Kolen,et al.  Field Guide to Dynamical Recurrent Networks , 2001 .

[26]  John F. Kolen,et al.  Neural Network Architectures for the Modeling of Dynamic Systems , 2001 .

[27]  Ralph Neuneier,et al.  Modeling Dynamical Systems by Error Correction Neural Networks , 2002 .

[28]  Risto Miikkulainen,et al.  Robust non-linear control through neuroevolution , 2003 .

[29]  Faustino J. Gomez,et al.  PhD Thesis: Robust Non-Linear Control through Neuroevolution , 2003 .

[30]  Pat Langley,et al.  Editorial: On Machine Learning , 1986, Machine Learning.

[31]  Jennie Si,et al.  Supervised ActorCritic Reinforcement Learning , 2004 .

[32]  Pieter Bram Bakker,et al.  The state of mind : reinforcement learning with recurrent neural networks , 2004 .

[33]  Martin A. Riedmiller Neural Fitted Q Iteration - First Experiences with a Data Efficient Neural Reinforcement Learning Method , 2005, ECML.

[34]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[35]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[36]  Stefan Schaal,et al.  Natural Actor-Critic , 2003, Neurocomputing.

[37]  Terrence J. Sejnowski,et al.  New Directions in Statistical Signal Processing: From Systems to Brains (Neural Information Processing) , 2006 .

[38]  Marc Toussaint,et al.  Extracting Motion Primitives from Natural Handwriting Data , 2006, ICANN.

[39]  Hans-Georg Zimmermann,et al.  Recurrent Neural Networks Are Universal Approximators , 2006, ICANN.

[40]  Steffen Udluft,et al.  A Neural Reinforcement Learning Approach to Gas Turbine Control , 2007, 2007 International Joint Conference on Neural Networks.

[41]  Steffen Udluft,et al.  The Recurrent Control Neural Network , 2007, ESANN.

[42]  Thomas Martinetz,et al.  Neural Rewards Regression for near-optimal policy identification in Markovian and partial observable environments , 2007, ESANN.

[43]  Stefan Schaal,et al.  2008 Special Issue: Reinforcement learning of motor skills with policy gradients , 2008 .

[44]  Daniel Schneegaß,et al.  Steigerung der Informationseffizienz im Reinforcement-Learning , 2008 .

[45]  Martin A. Riedmiller,et al.  The Neuro Slot Car Racer: Reinforcement Learning in a Real World Setting , 2009, 2009 International Conference on Machine Learning and Applications.

[46]  Simon Haykin,et al.  Neural Networks and Learning Machines , 2010 .

[47]  Steffen Udluft,et al.  The Markov Decision Process Extraction Network , 2010, ESANN.

[48]  Dan Roth,et al.  Knowledge and ignorance in reinforcement learning , 2011 .

[49]  Martin A. Riedmiller 10 Steps and Some Tricks to Set up Neural Reinforcement Controllers , 2012, Neural Networks: Tricks of the Trade.

[50]  Ralph Neuneier,et al.  How to Train Neural Networks , 2012, Neural Networks: Tricks of the Trade.

[51]  Grgoire Montavon,et al.  Neural Networks: Tricks of the Trade , 2012, Lecture Notes in Computer Science.

[52]  Michael T. Rosenstein,et al.  Supervised Actor‐Critic Reinforcement Learning , 2012 .

[53]  Steffen Udluft,et al.  Recurrent Neural State Estimation in Domains with Long-Term Dependencies , 2012, ESANN.

[54]  Hans-Georg Zimmermann,et al.  Forecasting with Recurrent Neural Networks: 12 Tricks , 2012, Neural Networks: Tricks of the Trade.