AMRL: Aggregated Memory For Reinforcement Learning

In many partially observable scenarios, Reinforcement Learning (RL) agents must rely on long-term memory in order to learn an optimal policy. We demonstrate that using techniques from NLP and supervised learning fails at RL tasks due to stochasticity from the environment and from exploration. Utilizing our insights on the limitations of traditional memory methods in RL, we propose AMRL, a class of models that can learn better policies with greater sample efficiency and are resilient to noisy inputs. Specifically, our models use a standard memory module to summarize short-term context, and then aggregate all prior states from the standard model without respect to order. We show that this provides advantages both in terms of gradient decay and signal-to-noise ratio over time. Evaluating in Minecraft and maze environments that test long-term memory, we find that our model improves average return by 19% over a baseline that has the same number of parameters and by 9% over a stronger baseline that has far more parameters.

[1]  Bram Bakker,et al.  Reinforcement Learning with LSTM in Non-Markovian Tasks with Long-Term Dependencies , 2001 .

[2]  Ruslan Salakhutdinov,et al.  Neural Map: Structured Memory for Deep Reinforcement Learning , 2017, ICLR.

[3]  Yoshua Bengio,et al.  Modeling the Long Term Future in Model-Based Reinforcement Learning , 2018, ICLR.

[4]  Yoshua Bengio,et al.  Unitary Evolution Recurrent Neural Networks , 2015, ICML.

[5]  Lindsay G. Cowell,et al.  Machine Learning on Sequential Data Using a Recurrent Weighted Average , 2017, Neurocomputing.

[6]  Sergey Levine,et al.  Learning deep neural network policies with continuous memory states , 2015, 2016 IEEE International Conference on Robotics and Automation (ICRA).

[7]  Peter Stone,et al.  Deep Recurrent Q-Learning for Partially Observable MDPs , 2015, AAAI Fall Symposia.

[8]  Ed H. Chi,et al.  Factorized Recurrent Neural Architectures for Longer Range Dependence , 2018, AISTATS.

[9]  Jesper Jensen,et al.  Joint separation and denoising of noisy multi-talker speech using recurrent neural networks and permutation invariant training , 2017, 2017 IEEE 27th International Workshop on Machine Learning for Signal Processing (MLSP).

[10]  Alec Radford,et al.  Proximal Policy Optimization Algorithms , 2017, ArXiv.

[11]  Marc'Aurelio Ranzato,et al.  Learning Longer Memory in Recurrent Neural Networks , 2014, ICLR.

[12]  Razvan Pascanu,et al.  How to Construct Deep Recurrent Neural Networks , 2013, ICLR.

[13]  Jianfeng Gao,et al.  Recurrent Reinforcement Learning: A Hybrid Approach , 2015, ArXiv.

[14]  Leslie Pack Kaelbling,et al.  Learning Policies with External Memory , 1999, ICML.

[15]  Razvan Pascanu,et al.  Learning to Navigate in Complex Environments , 2016, ICLR.

[16]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[17]  Yoshua Bengio,et al.  Memory Augmented Neural Networks with Wormhole Connections , 2017, ArXiv.

[18]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[19]  Honglak Lee,et al.  Control of Memory, Active Perception, and Action in Minecraft , 2016, ICML.

[20]  Katja Hofmann,et al.  The Malmo Platform for Artificial Intelligence Experimentation , 2016, IJCAI.

[21]  Björn W. Schuller,et al.  Feature enhancement by bidirectional LSTM networks for conversational speech recognition in highly non-stationary noise , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[22]  Rémi Munos,et al.  Recurrent Experience Replay in Distributed Reinforcement Learning , 2018, ICLR.

[23]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[24]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[25]  Wojciech Zaremba,et al.  Reinforcement Learning Neural Turing Machines , 2015, ArXiv.

[26]  Zeb Kurth-Nelson,et al.  Learning to reinforcement learn , 2016, CogSci.

[27]  Shuo Yang,et al.  Integrating Episodic Memory into a Reinforcement Learning Agent using Reservoir Sampling , 2018, ArXiv.

[28]  Quoc V. Le,et al.  Learning Longer-term Dependencies in RNNs with Auxiliary Losses , 2018, ICML.

[29]  Leslie Pack Kaelbling,et al.  Planning and Acting in Partially Observable Stochastic Domains , 1998, Artif. Intell..

[30]  Shimon Whiteson,et al.  Counterfactual Multi-Agent Policy Gradients , 2017, AAAI.

[31]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[32]  Razvan Pascanu,et al.  Understanding the exploding gradient problem , 2012, ArXiv.

[33]  Shimon Whiteson,et al.  QMIX: Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning , 2018, ICML.

[34]  Razvan Pascanu,et al.  Low-pass Recurrent Neural Networks - A memory architecture for longer-term correlation discovery , 2018, ArXiv.

[35]  Geoffrey E. Hinton,et al.  A Simple Way to Initialize Recurrent Networks of Rectified Linear Units , 2015, ArXiv.

[36]  Sung Ju Hwang,et al.  Episodic Memory Reader: Learning What to Remember for Question Answering from Streaming Data , 2019, ACL.

[37]  Svetha Venkatesh,et al.  Learning to Remember More with Less Memorization , 2019, ICLR.

[38]  Barnabás Póczos,et al.  The Statistical Recurrent Unit , 2017, ICML.

[39]  Michael I. Jordan,et al.  RLlib: Abstractions for Distributed Reinforcement Learning , 2017, ICML.

[40]  Roland Memisevic,et al.  Regularizing RNNs by Stabilizing Activations , 2015, ICLR.

[41]  Jürgen Schmidhuber,et al.  Recurrent policy gradients , 2010, Log. J. IGPL.