State-Space Constraints Improve the Generalization of the Differentiable Neural Computer in some Algorithmic Tasks

Memory-augmented neural networks (MANNs) can solve algorithmic tasks like sorting. However, they often do not generalize to lengths of input sequences not seen in the training phase. Therefore, we introduce two approaches constraining the state-space of the network controller to improve the generalization to out-of-distribution-sized input sequences: state compression and state regularization. We show that both approaches can improve the generalization capability of a particular type of MANN, the differentiable neural computer (DNC), and compare our approaches to a stateful and a stateless controller on a set of algorithmic tasks. Furthermore, we show that especially the combination of both approaches can enable a pre-trained DNC to be extended post hoc with a larger memory. Thus, our introduced approaches allow to train a DNC using shorter input sequences and thus save computational resources. Moreover, we observed that the capability for generalization is often accompanied by loop structures in the state-space, which could correspond to looping constructs in algorithms.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Bernhard Schölkopf,et al.  Recurrent Independent Mechanisms , 2021, ICLR.

[3]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[4]  Alex Graves,et al.  Scaling Memory-Augmented Neural Networks with Sparse Reads and Writes , 2016, NIPS.

[5]  Alexander M. Rush,et al.  Lie-Access Neural Turing Machines , 2016, ICLR.

[6]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[7]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[8]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[9]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[10]  Kevin Swersky,et al.  Neural Execution Engines: Learning to Execute Subroutines , 2020, NeurIPS.

[11]  Ying Ma,et al.  A Taxonomy for Neural Memory Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Truyen Tran,et al.  Neural Stored-program Memory , 2019, ICLR.

[13]  Hava T. Siegelmann,et al.  On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..

[14]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[15]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[16]  Minho Lee,et al.  Distributed Memory based Self-Supervised Differentiable Neural Computer , 2020, ArXiv.

[17]  Marc Brockschmidt,et al.  Neural Program Lattices , 2016, ICLR.

[18]  Jürgen Schmidhuber,et al.  Improving Differentiable Neural Computers Through Memory Masking, De-allocation, and Link Distribution Sharpness Control , 2019, ICLR.

[19]  Wojciech Zaremba,et al.  Learning to Execute , 2014, ArXiv.

[20]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[21]  Alex Graves,et al.  Grid Long Short-Term Memory , 2015, ICLR.

[22]  Lukasz Kaiser,et al.  Neural GPUs Learn Algorithms , 2015, ICLR.

[23]  Nando de Freitas,et al.  Neural Programmer-Interpreters , 2015, ICLR.

[24]  Jürgen Schmidhuber,et al.  Learning Precise Timing with LSTM Recurrent Networks , 2003, J. Mach. Learn. Res..

[25]  Dawn Xiaodong Song,et al.  Making Neural Programming Architectures Generalize via Recursion , 2017, ICLR.

[26]  Jörg Franke,et al.  Robust and Scalable Differentiable Neural Computer for Question Answering , 2018, QA@ACL.

[27]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[28]  Marcin Andrychowicz,et al.  Neural Random Access Machines , 2015, ERCIM News.

[29]  Hung Le,et al.  Neurocoder: Learning General-Purpose Computation Using Stored Neural Programs , 2020, ArXiv.

[30]  Andrew W. Senior,et al.  Long short-term memory recurrent neural network architectures for large scale acoustic modeling , 2014, INTERSPEECH.

[31]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[32]  Sergio Gomez Colmenarejo,et al.  Hybrid computing using a neural network with dynamic external memory , 2016, Nature.

[33]  Matthew R. Walter,et al.  Multigrid Neural Memory , 2019, ICML.

[34]  Yoshimasa Tsuruoka,et al.  Partially Non-Recurrent Controllers for Memory-Augmented Neural Networks , 2018, ArXiv.

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.