Learning to Remember, Forget and Ignore using Attention Control in Memory

Typical neural networks with external memory do not effectively separate capacity for episodic and working memory as is required for reasoning in humans. Applying knowledge gained from psychological studies, we designed a new model called Differentiable Working Memory (DWM) in order to specifically emulate human working memory. As it shows the same functional characteristics as working memory, it robustly learns psychology inspired tasks and converges faster than comparable state-of-the-art models. Moreover, the DWM model successfully generalizes to sequences two orders of magnitude longer than the ones used in training. Our in-depth analysis shows that the behavior of DWM is interpretable and that it learns to have fine control over memory, allowing it to retain, ignore or forget information based on its relevance.

[1]  Klaus Oberauer,et al.  An Interference Model of Visual Working Memory , 2017, Psychological review.

[2]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[3]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[4]  Wojciech Zaremba,et al.  Learning Simple Algorithms from Examples , 2015, ICML.

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[7]  P. Carpenter,et al.  Individual differences in working memory and reading , 1980 .

[8]  R. Engle,et al.  Executive Attention, Working Memory Capacity, and a Two-Factor Theory of Cognitive Control. , 2003 .

[9]  Michael F. Bunting,et al.  Working memory span tasks: A methodological review and user’s guide , 2005, Psychonomic bulletin & review.

[10]  Bartunov Sergey,et al.  Meta-Learning with Memory-Augmented Neural Networks , 2016 .

[11]  Andrew Hollingworth,et al.  The strategic retention of task-relevant objects in visual working memory. , 2013, Journal of experimental psychology. Learning, memory, and cognition.

[12]  C. Constantinidis,et al.  The neuroscience of working memory capacity and training , 2016, Nature Reviews Neuroscience.

[13]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[14]  Andrew R. A. Conway,et al.  Working memory and retrieval: a resource-dependent inhibition model. , 1994, Journal of experimental psychology. General.

[15]  Wojciech Zaremba Learning Algorithms from Data , 2016 .

[16]  Randall W Engle,et al.  Working memory, short-term memory, and general fluid intelligence: a latent-variable approach. , 1999, Journal of experimental psychology. General.

[17]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[18]  Yoshua Bengio,et al.  Memory Augmented Neural Networks with Wormhole Connections , 2017, ArXiv.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Nelson Cowan,et al.  The many faces of working memory and short-term storage , 2017, Psychonomic bulletin & review.

[21]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[22]  T. Klingberg,et al.  Prefrontal cortex and basal ganglia control access to working memory , 2008, Nature Neuroscience.

[23]  J. Hopfield,et al.  Computing with neural circuits: a model. , 1986, Science.

[24]  Klaus Oberauer,et al.  Design for a working memory. , 2009 .

[25]  Yoshua Bengio,et al.  Dynamic Neural Turing Machine with Soft and Hard Addressing Schemes , 2016, ArXiv.

[26]  H. Eichenbaum,et al.  Critical role of the hippocampus in memory for sequences of events , 2002, Nature Neuroscience.

[27]  Marc W. Howard,et al.  Is working memory stored along a logarithmic timeline? Converging evidence from neuroscience, behavior and models , 2018, Neurobiology of Learning and Memory.

[28]  Adam Gazzaley,et al.  Mechanisms of working memory disruption by external interference. , 2010, Cerebral cortex.

[29]  A. Baddeley Working memory: looking back and looking forward , 2003, Nature Reviews Neuroscience.

[30]  Maro G. Machizawa,et al.  Neural measures reveal individual differences in controlling access to working memory , 2005, Nature.

[31]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.