Learning Efficient Algorithms with Hierarchical Attentive Memory

In this paper, we propose and investigate a novel memory architecture for neural networks called Hierarchical Attentive Memory (HAM). It is based on a binary tree with leaves corresponding to memory cells. This allows HAM to perform memory access in O(log n) complexity, which is a significant improvement over the standard attention mechanism that requires O(n) operations, where n is the size of the memory. We show that an LSTM network augmented with HAM can learn algorithms for problems like merging, sorting or binary searching from pure input-output examples. In particular, it learns to sort n numbers in time O(n log n) and generalizes well to input sequences much longer than the ones seen during the training. We also show that HAM can be trained to act like classic data structures: a stack, a FIFO queue and a priority queue.

[1]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[2]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[3]  J. Cacioppo,et al.  Handbook of neuroscience for the behavioral sciences , 2009 .

[4]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[5]  Razvan Pascanu,et al.  Understanding the exploding gradient problem , 2012, ArXiv.

[6]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[7]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[8]  Pieter Abbeel,et al.  Gradient Estimation Using Stochastic Computation Graphs , 2015, NIPS.

[9]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[10]  Wojciech Zaremba,et al.  Reinforcement Learning Neural Turing Machines - Revised , 2015 .

[11]  Jason Weston,et al.  Memory Networks , 2014, ICLR.

[12]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[13]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[14]  Tomas Mikolov,et al.  Inferring Algorithmic Patterns with Stack-Augmented Recurrent Nets , 2015, NIPS.

[15]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[17]  Phil Blunsom,et al.  Learning to Transduce with Unbounded Memory , 2015, NIPS.

[18]  Wojciech Zaremba,et al.  Reinforcement Learning Neural Turing Machines , 2015, ArXiv.

[19]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[20]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[21]  Wojciech Zaremba,et al.  Learning Simple Algorithms from Examples , 2015, ICML.

[22]  Alex Graves,et al.  Grid Long Short-Term Memory , 2015, ICLR.

[23]  Lukasz Kaiser,et al.  Neural GPUs Learn Algorithms , 2015, ICLR.

[24]  Marcin Andrychowicz,et al.  Neural Random Access Machines , 2015, ERCIM News.

[25]  Nando de Freitas,et al.  Neural Programmer-Interpreters , 2015, ICLR.