Improving Differentiable Neural Computers Through Memory Masking, De-allocation, and Link Distribution Sharpness Control

The Differentiable Neural Computer (DNC) can learn algorithmic and question answering tasks. An analysis of its internal activation patterns reveals three problems: Most importantly, the lack of key-value separation makes the address distribution resulting from content-based look-up noisy and flat, since the value influences the score calculation, although only the key should. Second, DNC's de-allocation of memory results in aliasing, which is a problem for content-based look-up. Thirdly, chaining memory reads with the temporal linkage matrix exponentially degrades the quality of the address distribution. Our proposed fixes of these problems yield improved performance on arithmetic tasks, and also improve the mean error rate on the bAbI question answering dataset by 43%.

[1]  Michael C. Mozer,et al.  A Connectionist Symbol Manipulator that Discovers the Structure of Context-Free Languages , 1992, NIPS.

[2]  Alex Graves,et al.  Adaptive Computation Time for Recurrent Neural Networks , 2016, ArXiv.

[3]  Jason Weston,et al.  Tracking the World State with Recurrent Entity Networks , 2016, ICLR.

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Hava T. Siegelmann,et al.  On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..

[6]  Jason Weston,et al.  Weakly Supervised Memory Networks , 2015, ArXiv.

[7]  Razvan Pascanu,et al.  A simple neural network module for relational reasoning , 2017, NIPS.

[8]  Jason Weston,et al.  Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks , 2015, ICLR.

[9]  Lukasz Kaiser,et al.  Universal Transformers , 2018, ICLR.

[10]  Colin Giles,et al.  Learning Context-free Grammars: Capabilities and Limitations of a Recurrent Neural Network with an External Stack Memory (cid:3) , 1992 .

[11]  Jürgen Schmidhuber,et al.  Self-Delimiting Neural Networks , 2012, ArXiv.

[12]  Jörg Franke,et al.  Robust and Scalable Differentiable Neural Computer for Question Answering , 2018, QA@ACL.

[13]  Alex Graves,et al.  Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes , 2016, NIPS.

[14]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[15]  Jürgen Schmidhuber,et al.  Learning to forget: continual prediction with LSTM , 1999 .

[16]  Jason Weston,et al.  Key-Value Memory Networks for Directly Reading Documents , 2016, EMNLP.

[17]  Alan Joseph Bekker,et al.  Differentiable Memory Allocation Mechanism For Neural Computing , 2017 .