Using neural networks in the reinforcement learning (RL) framework has achieved notable successes. Yet, neural networks tend to forget what they learned in the past, especially when they learn online and fully incrementally, a setting in which the weights are updated after each sample is received and the sample is then discarded. Under this setting, an update can lead to overly global generalization by changing too many weights. The global generalization interferes with what was previously learned and deteriorates performance, a phenomenon known as catastrophic interference. Many previous works use mechanisms such as experience replay (ER) buffers to mitigate interference by performing minibatch updates, ensuring the data distribution is approximately independent-and-identically-distributed (i.i.d.). But using ER would become infeasible in terms of memory as problem complexity increases. Thus, it is crucial to look for more memory-efficient alternatives. Interference can be averted if we replace global updates with more local ones, so only weights responsible for the observed data sample are updated. In this work, we propose the use of dynamic self-organizing map (DSOM) with neural networks to induce such locality in the updates without ER buffers. Our method learns a DSOM to produce a mask to reweigh each hidden unit's output, modulating its degree of use. It prevents interference by replacing global updates with local ones, conditioned on the agent's state. We validate our method on standard RL benchmarks including Mountain Car and Lunar Lander, where existing methods often fail to learn without ER. Empirically, we show that our online and fully incremental method is on par with and in some cases, better than state-of-the-art in terms of final performance and learning speed. We provide visualizations and quantitative measures to show that our method indeed mitigates interference.
[1]
Martha White,et al.
Unifying Task Specification in Reinforcement Learning
,
2016,
ICML.
[2]
Gerald Tesauro,et al.
Learning to Learn without Forgetting By Maximizing Transfer and Minimizing Interference
,
2018,
ICLR.
[3]
Shane Legg,et al.
Human-level control through deep reinforcement learning
,
2015,
Nature.
[4]
Richard S. Sutton,et al.
Two geometric input transformation methods for fast online reinforcement learning with neural nets
,
2018,
ArXiv.
[5]
Yoshua Bengio,et al.
Neural Machine Translation by Jointly Learning to Align and Translate
,
2014,
ICLR.
[6]
Richard S. Sutton,et al.
Reinforcement Learning: An Introduction
,
1998,
IEEE Trans. Neural Networks.
[7]
Yann Boniface,et al.
Dynamic self-organising map
,
2011,
Neurocomputing.
[8]
Robert M. French,et al.
Semi-distributed Representations and Catastrophic Forgetting in Connectionist Networks
,
1992
.
[9]
Geoffrey E. Hinton,et al.
ImageNet classification with deep convolutional neural networks
,
2012,
Commun. ACM.
[10]
R. French.
Catastrophic forgetting in connectionist networks
,
1999,
Trends in Cognitive Sciences.
[11]
Teuvo Kohonen,et al.
The self-organizing map
,
1990,
Neurocomputing.
[12]
Razvan Pascanu,et al.
Overcoming catastrophic forgetting in neural networks
,
2016,
Proceedings of the National Academy of Sciences.
[13]
Michael McCloskey,et al.
Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem
,
1989
.
[14]
Wojciech Zaremba,et al.
OpenAI Gym
,
2016,
ArXiv.
[15]
Martin Schrimpf,et al.
Continual Learning with Self-Organizing Maps
,
2019,
ArXiv.
[16]
Mahesan Niranjan,et al.
On-line Q-learning using connectionist systems
,
1994
.
[17]
Martha White,et al.
The Utility of Sparse Representations for Control in Reinforcement Learning
,
2018,
AAAI.