Investigating Action Encodings in Recurrent Neural Networks in Reinforcement Learning

Building and maintaining state to learn policies and value functions is critical for deploying reinforcement learning (RL) agents in the real world. Recurrent neural networks (RNNs) have become a key point of interest for the state-building problem, and several large-scale reinforcement learning agents incorporate recurrent networks. While RNNs have become a mainstay in many RL applications, many key design choices and implementation details responsible for performance improvements are often not reported. In this work, we discuss one axis on which RNN architectures can be (and have been) modified for use in RL. Specifically, we look at how action information can be incorporated into the state update function of a recurrent cell. We discuss several choices in using action information and empirically evaluate the resulting architectures on a set of illustrative domains. Finally, we discuss future work in developing recurrent cells and discuss challenges specific to the RL setting.

[1]  P. Pilarski,et al.  What’s a good prediction? Challenges in evaluating an agent’s knowledge , 2022, Adapt. Behav..

[2]  Richard S. Sutton,et al.  Learning Agent State Online with Recurrent Generate-and-Test , 2021, ArXiv.

[3]  R. Sutton,et al.  Continual Backprop: Stochastic Gradient Descent with Persistent Randomness , 2021, ArXiv.

[4]  Pieter Abbeel,et al.  Decision Transformer: Reinforcement Learning via Sequence Modeling , 2021, NeurIPS.

[5]  Elliot A. Ludvig,et al.  From eye-blinks to state construction: Diagnostic benchmarks for online representation learning , 2020, Adapt. Behav..

[6]  André Barreto,et al.  Expected Eligibility Traces , 2020, AAAI.

[7]  Erich Elsen,et al.  A Practical Sparse Approximation for Real Time Recurrent Learning , 2020, ArXiv.

[8]  Wojciech M. Czarnecki,et al.  Grandmaster level in StarCraft II using multi-agent reinforcement learning , 2019, Nature.

[9]  Razvan Pascanu,et al.  Stabilizing Transformers for Reinforcement Learning , 2019, ICML.

[10]  Yoshua Bengio,et al.  Towards Non-saturating Recurrent Units for Modelling Long-term Dependencies , 2019, AAAI.

[11]  Yoshua Bengio,et al.  Toward Training Recurrent Neural Networks for Lifelong Learning , 2018, Neural Computation.

[12]  Rémi Munos,et al.  Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.

[13]  Christopher Joseph Pal,et al.  Sparse Attentive Backtracking: Temporal CreditAssignment Through Reminding , 2018, NeurIPS.

[14]  Martha White,et al.  General Value Function Networks , 2018, J. Artif. Intell. Res..

[15]  Shimon Whiteson,et al.  Deep Variational Reinforcement Learning for POMDPs , 2018, ICML.

[16]  Angelika Steger,et al.  Approximating Real-Time Recurrent Learning with Random Kronecker Factors , 2018, NeurIPS.

[17]  Mike Innes,et al.  Flux: Elegant machine learning with Julia , 2018, J. Open Source Softw..

[18]  Shane Legg,et al.  IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures , 2018, ICML.

[19]  Zhen Li,et al.  Understanding Hidden Memories of Recurrent Neural Networks , 2017, 2017 IEEE Conference on Visual Analytics Science and Technology (VAST).

[20]  Tom Schaul,et al.  Rainbow: Combining Improvements in Deep Reinforcement Learning , 2017, AAAI.

[21]  Pieter Abbeel,et al.  A Simple Neural Attentive Meta-Learner , 2017, ICLR.

[22]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[23]  Byron Boots,et al.  Predictive State Recurrent Neural Networks , 2017, NIPS.

[24]  P. Poupart,et al.  On Improving Deep Reinforcement Learning for POMDPs , 2017, ArXiv.

[25]  Xiaodong Cui,et al.  English Conversational Telephone Speech Recognition by Humans and Machines , 2017, INTERSPEECH.

[26]  Yann Ollivier,et al.  Unbiased Online Recurrent Optimization , 2017, ICLR.

[27]  Tom Schaul,et al.  Reinforcement Learning with Unsupervised Auxiliary Tasks , 2016, ICLR.

[28]  Ying Zhang,et al.  On Multiplicative Integration with Recurrent Neural Networks , 2016, NIPS.

[29]  David Silver,et al.  Memory-based control with recurrent neural networks , 2015, ArXiv.

[30]  Kyunghyun Cho,et al.  Larger-Context Language Modelling with Recurrent Neural Network , 2015, ACL.

[31]  Richard S. Sutton,et al.  Learning to Predict Independent of Span , 2015, ArXiv.

[32]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[33]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[34]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[35]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[36]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[37]  Alan Edelman,et al.  Julia: A Fresh Approach to Numerical Computing , 2014, SIAM Rev..

[38]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[39]  Richard S. Sutton,et al.  Representation Search through Generate and Test , 2013, AAAI Workshop: Learning Rich Representations from Low-Level Sensors.

[40]  A. Clark Whatever next? Predictive brains, situated agents, and the future of cognitive science. , 2013, The Behavioral and brain sciences.

[41]  Razvan Pascanu,et al.  Advances in optimizing recurrent networks , 2012, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[42]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[43]  R. Sutton,et al.  Multi-timescale nexting in a reinforcement learning robot , 2011, Adapt. Behav..

[44]  Geoffrey E. Hinton,et al.  Generating Text with Recurrent Neural Networks , 2011, ICML.

[45]  Patrick M. Pilarski,et al.  Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[46]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[47]  Jürgen Schmidhuber,et al.  Solving Deep Memory POMDPs with Recurrent Policy Gradients , 2007, ICANN.

[48]  S. Udluft,et al.  A Recurrent Control Neural Network for Data Efficient Reinforcement Learning , 2007, 2007 IEEE International Symposium on Approximate Dynamic Programming and Reinforcement Learning.

[49]  Richard S. Sutton,et al.  Temporal Abstraction in Temporal-difference Networks , 2005, NIPS.

[50]  Richard S. Sutton,et al.  Temporal-Difference Networks , 2004, NIPS.

[51]  Randall D. Beer,et al.  The Dynamics of Active Categorical Perception in an Evolved Model Agent , 2003, Adapt. Behav..

[52]  Bram Bakker,et al.  Reinforcement Learning with Long Short-Term Memory , 2001, NIPS.

[53]  Richard S. Sutton,et al.  Predictive Representations of State , 2001, NIPS.

[54]  Hajime Kita,et al.  Recurrent neural networks for reinforcement learning: architecture, learning algorithms and internal representation , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[55]  Gerald Tesauro,et al.  Temporal difference learning and TD-Gammon , 1995, CACM.

[56]  Richard S. Sutton,et al.  TD Models: Modeling the World at a Mixture of Time Scales , 1995, ICML.

[57]  Srimat T. Chakradhar,et al.  First-order versus second-order single-layer recurrent neural networks , 1994, IEEE Trans. Neural Networks.

[58]  Michael C. Mozer,et al.  Induction of Multiscale Temporal Structure , 1991, NIPS.

[59]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[60]  Richard S. Sutton,et al.  Learning to predict by the methods of temporal differences , 1988, Machine Learning.

[61]  Yunbo Wang,et al.  Eidetic 3D LSTM: A Model for Video Prediction and Beyond , 2019, ICLR.

[62]  Adam M White,et al.  DEVELOPING A PREDICTIVE APPROACH TO KNOWLEDGE , 2015 .

[63]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[64]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[65]  Alva Noë,et al.  Action in Perception , 2006, Representation and Mind.

[66]  Herbert Jaeger,et al.  Adaptive Nonlinear System Identification with Echo State Networks , 2002, NIPS.

[67]  Daniel Kudenko,et al.  Feature Generation for Sequence Categorization , 1998, AAAI/IAAI.

[68]  W. Prinz A common-coding approach to perception and action , 1990 .

[69]  Michael C. Mozer,et al.  A Focused Backpropagation Algorithm for Temporal Pattern Recognition , 1989, Complex Syst..