Strongly-Typed Recurrent Neural Networks

Recurrent neural networks are increasing popular models for sequential learning. Unfortunately, although the most effective RNN architectures are perhaps excessively complicated, extensive searches have not found simpler alternatives. This paper imports ideas from physics and functional programming into RNN design to provide guiding principles. From physics, we introduce type constraints, analogous to the constraints that forbids adding meters to seconds. From functional programming, we require that strongly-typed architectures factorize into stateless learnware and state-dependent firmware, reducing the impact of side-effects. The features learned by strongly-typed nets have a simple semantic interpretation via dynamic average-pooling on one-dimensional convolutions. We also show that strongly-typed gradients are better behaved than in classical architectures, and characterize the representational power of strongly-typed nets. Finally, experiments show that, despite being more constrained, strongly-typed architectures achieve lower training and comparable generalization error to classical architectures.

[1]  N. C. Dimensional Analysis , 1932, Nature.

[2]  Edsger W. Dijkstra,et al.  Letters to the editor: go to statement considered harmful , 1968, CACM.

[3]  John C. Reynolds,et al.  Towards a theory of type structure , 1974, Symposium on Programming.

[4]  Edsger W. Dijkstra,et al.  Go To Statement Considered Harmful , 2022, Software Pioneers.

[5]  J. Girard,et al.  Proofs and types , 1989 .

[6]  Sepp Hochreiter,et al.  Untersuchungen zu dynamischen neuronalen Netzen , 1991 .

[7]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[8]  Allan Pinkus,et al.  Multilayer Feedforward Networks with a Non-Polynomial Activation Function Can Approximate Any Function , 1991, Neural Networks.

[9]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[10]  Hava T. Siegelmann,et al.  On the Computational Power of Neural Nets , 1995, J. Comput. Syst. Sci..

[11]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[12]  Benjamin C. Pierce,et al.  Types and programming languages: the next generation , 2003, 18th Annual IEEE Symposium of Logic in Computer Science, 2003. Proceedings..

[13]  Hasok Chang,et al.  Inventing Temperature: Measurement and Scientific Progress , 2004 .

[14]  Julian Togelius,et al.  Evolving Memory Cell Structures for Sequence Learning , 2009, ICANN.

[15]  David Balduzzi,et al.  On the information-theoretic structure of distributed measurements , 2011, DCM.

[16]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[17]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[18]  L. Bottou From machine learning to machine reasoning , 2011, Machine Learning.

[19]  Léon Bottou,et al.  From machine learning to machine reasoning , 2011, Machine Learning.

[20]  Hugo Daniel Macedo,et al.  Typing linear algebra: A biproduct-oriented approach , 2013, Sci. Comput. Program..

[21]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[22]  Razvan Pascanu,et al.  How to Construct Deep Recurrent Neural Networks , 2013, ICLR.

[23]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[24]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[25]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[26]  Fei-Fei Li,et al.  Visualizing and Understanding Recurrent Networks , 2015, ArXiv.

[27]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[28]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[29]  Marc'Aurelio Ranzato,et al.  Learning Longer Memory in Recurrent Neural Networks , 2014, ICLR.

[30]  Alex Graves,et al.  DRAW: A Recurrent Neural Network For Image Generation , 2015, ICML.

[31]  Geoffrey E. Hinton,et al.  A Simple Way to Initialize Recurrent Networks of Rectified Linear Units , 2015, ArXiv.

[32]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Phil Blunsom,et al.  Learning to Transduce with Unbounded Memory , 2015, NIPS.

[34]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[35]  Phillipp Bergmann Multidimensional Analysis Algebras And Systems For Science And Engineering , 2016 .

[36]  Misha Denil,et al.  ACDC: A Structured Efficient Linear Layer , 2015, ICLR.

[37]  Ying Zhang,et al.  Batch normalized recurrent neural networks , 2015, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[38]  Fei-Fei Li,et al.  Deep visual-semantic alignments for generating image descriptions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Jürgen Schmidhuber,et al.  LSTM: A Search Space Odyssey , 2015, IEEE Transactions on Neural Networks and Learning Systems.