Learning to Execute

Recurrent Neural Networks (RNNs) with Long Short-Term Memory units (LSTM) are widely used because they are expressive and are easy to train. Our interest lies in empirically evaluating the expressiveness and the learnability of LSTMs in the sequence-to-sequence regime by training them to evaluate short computer programs, a domain that has traditionally been seen as too complex for neural networks. We consider a simple class of programs that can be evaluated with a single left-to-right pass using constant memory. Our main result is that LSTMs can learn to map the character-level representations of such programs to their correct outputs. Notably, it was necessary to use curriculum learning, and while conventional curriculum learning proved ineffective, we developed a new variant of curriculum learning that improved our networks' performance in all experimental conditions. The improved curriculum had a dramatic impact on an addition problem, making it possible to train an LSTM to add two 9-digit numbers with 99% accuracy.

[1]  T. Hill A Statistical Derivation of the Significant-Digit Law , 1995 .

[2]  Steve Renals,et al.  THE USE OF RECURRENT NEURAL NETWORKS IN CONTINUOUS SPEECH RECOGNITION , 1996 .

[3]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[4]  Herbert Jaeger,et al.  Optimization and applications of echo state networks with leaky- integrator neurons , 2007, Neural Networks.

[5]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[6]  James Martens,et al.  Deep learning via Hessian-free optimization , 2010, ICML.

[7]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[8]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[9]  Mohamed Chtourou,et al.  On the training of recurrent neural networks , 2011, Eighth International Multi-Conference on Systems, Signals & Devices.

[10]  Yong Jae Lee,et al.  Learning the easy things first: Self-paced visual category discovery , 2011, CVPR 2011.

[11]  Vysoké Učení,et al.  Statistical Language Models Based on Neural Networks , 2012 .

[12]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Alex Graves,et al.  Generating Sequences With Recurrent Neural Networks , 2013, ArXiv.

[14]  Razvan Pascanu,et al.  Advances in optimizing recurrent networks , 2012, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Geoffrey E. Hinton,et al.  Training Recurrent Neural Networks , 2013 .

[16]  Samuel R. Bowman Can recursive neural tensor networks learn logical reasoning? , 2014, ICLR.

[17]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[18]  Daniel Tarlow,et al.  Structured Generative Models of Natural Source Code , 2014, ICML.

[19]  R. Fergus,et al.  Learning to Discover Efficient Mathematical Identities , 2014, NIPS.

[20]  Christopher Potts,et al.  Recursive Neural Networks for Learning Logical Semantics , 2014, ArXiv.

[21]  Razvan Pascanu,et al.  How to Construct Deep Recurrent Neural Networks , 2013, ICLR.

[22]  Jürgen Schmidhuber,et al.  A Clockwork RNN , 2014, ICML.

[23]  Christopher Potts,et al.  Recursive Neural Networks Can Learn Logical Semantics , 2014, CVSC.

[24]  Christopher Kermorvant,et al.  Dropout Improves Recurrent Neural Networks for Handwriting Recognition , 2013, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[25]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[26]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[27]  Zhi Jin,et al.  Building Program Vector Representations for Deep Learning , 2014, KSEM.