Indylstms: Independently Recurrent LSTMS

We introduce Independently Recurrent Long Short-term Memory cells: IndyLSTMs. These differ from regular LSTM cells in that the recurrent weights are not modeled as a full matrix, but as a diagonal matrix, i.e. the output and state of each LSTM cell depends on the inputs and its own output/state, as opposed to the input and the outputs/states of all the cells in the layer. The number of parameters per Indy-LSTM layer, and thus the number of FLOPS per evaluation, is linear in the number of nodes in the layer, as opposed to quadratic for regular LSTM layers, resulting in potentially both smaller and faster models.We evaluate their performance experimentally by training several models on the popular IAM-OnDB and CASIA online handwriting datasets, as well as on several of our in-house datasets. We show that IndyLSTMs, despite their smaller size, consistently outperform regular LSTMs both in terms of accuracy per parameter, and in best accuracy overall. We attribute this improved performance to the IndyLSTMs being less prone to overfitting.

[1]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[2]  Daan van Esch,et al.  Mining Training Data for Language Modeling Across the World's Languages , 2018, SLTU.

[3]  Song Han,et al.  ESE: Efficient Speech Recognition Engine with Sparse LSTM on FPGA , 2016, FPGA.

[4]  Daan van Esch,et al.  Text Normalization Infrastructure that Scales to Hundreds of Language Varieties , 2018, LREC.

[5]  Wojciech Zaremba,et al.  An Empirical Exploration of Recurrent Network Architectures , 2015, ICML.

[6]  Thorsten Brants,et al.  Large Language Models in Machine Translation , 2007, EMNLP.

[7]  Christopher Kermorvant,et al.  Dropout Improves Recurrent Neural Networks for Handwriting Recognition , 2013, 2014 14th International Conference on Frontiers in Handwriting Recognition.

[8]  Shuai Li,et al.  Independently Recurrent Neural Network (IndRNN): Building A Longer and Deeper RNN , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[9]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[10]  Thomas Laurent,et al.  A recurrent neural network without chaos , 2016, ICLR.

[11]  Victor Carbune,et al.  Multi-Language Online Handwriting Recognition , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Marcus Liwicki,et al.  IAM-OnDB - an on-line English sentence database acquired from handwritten text on a whiteboard , 2005, Eighth International Conference on Document Analysis and Recognition (ICDAR'05).

[13]  Paris Smaragdis,et al.  Diagonal rnns in symbolic music modeling , 2017, 2017 IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA).

[14]  Michael I. Jordan Serial Order: A Parallel Distributed Processing Approach , 1997 .

[15]  Yoshua Bengio,et al.  Drawing and Recognizing Chinese Characters with Recurrent Neural Network , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[17]  Fei Yin,et al.  CASIA Online and Offline Chinese Handwriting Databases , 2011, 2011 International Conference on Document Analysis and Recognition.

[18]  Jürgen Schmidhuber,et al.  Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks , 2006, ICML.

[19]  Victor Carbune,et al.  Fast multi-language LSTM-based online handwriting recognition , 2020, International Journal on Document Analysis and Recognition (IJDAR).

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Richard Socher,et al.  Quasi-Recurrent Neural Networks , 2016, ICLR.

[22]  Quoc V. Le,et al.  Neural Architecture Search with Reinforcement Learning , 2016, ICLR.

[23]  Minmin Chen,et al.  MinimalRNN: Toward More Interpretable and Trainable Recurrent Neural Networks , 2017, ArXiv.

[24]  Yoshua Bengio,et al.  Gated Orthogonal Recurrent Units: On Learning to Forget , 2017, Neural Computation.