Edinburgh Research Explorer A Lightweight Recurrent Network for Sequence Modeling

Recurrent networks have achieved great success on various sequential tasks with the assistance of complex recurrent units, but suffer from severe computational inefficiency due to weak parallelization. One direction to alleviate this issue is to shift heavy computations outside the recurrence. In this paper, we propose a lightweight recurrent network, or LRN. LRN uses input and forget gates to handle long-range dependencies as well as gradient vanishing and explosion, with all parameterrelated calculations factored outside the recurrence. The recurrence in LRN only manipulates the weight assigned to each token, tightly connecting LRN with self-attention networks. We apply LRN as a drop-in replacement of existing recurrent units in several neural sequential models. Extensive experiments on six NLP tasks show that LRN yields the best running efficiency with little or no loss in model performance.1

[1]  Deyi Xiong,et al.  Simplifying Neural Machine Translation with Addition-Subtraction Twin-Gated Recurrent Networks , 2018, EMNLP.

[2]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[3]  Deyi Xiong,et al.  Accelerating Neural Transformer via an Average Attention Network , 2018, ACL.

[4]  Roy Schwartz,et al.  Rational Recurrences , 2018, EMNLP.

[5]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[6]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[7]  Ming Zhou,et al.  Gated Self-Matching Networks for Reading Comprehension and Question Answering , 2017, ACL.

[8]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[9]  Muhammad Ghifary,et al.  Strongly-Typed Recurrent Neural Networks , 2016, ICML.

[10]  Omer Levy,et al.  Recurrent Additive Networks , 2017, ArXiv.

[11]  Philipp Koehn,et al.  Findings of the 2014 Workshop on Statistical Machine Translation , 2014, WMT@ACL.

[12]  Boris Ginsburg,et al.  Factorization tricks for LSTM networks , 2017, ICLR.

[13]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[14]  Jin-Hyuk Hong,et al.  Semantic Sentence Matching with Densely-connected Recurrent and Co-attentive Information , 2018, AAAI.

[15]  Ankur Bapna,et al.  The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation , 2018, ACL.

[16]  Yu Zhang,et al.  Simple Recurrent Units for Highly Parallelizable Recurrence , 2017, EMNLP.

[17]  Phil Blunsom,et al.  Optimizing Performance of Recurrent Neural Networks on GPUs , 2016, ArXiv.

[18]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[19]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[20]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[21]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[22]  Erich Elsen,et al.  Persistent RNNs: Stashing Recurrent Weights On-Chip , 2016, ICML.

[23]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[24]  Phil Blunsom,et al.  Reasoning about Entailment with Neural Attention , 2015, ICLR.

[25]  Ruslan Salakhutdinov,et al.  Breaking the Softmax Bottleneck: A High-Rank RNN Language Model , 2017, ICLR.

[26]  Jinsong Su,et al.  Neural Machine Translation with Deep Attention , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[28]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[29]  Rico Sennrich,et al.  Neural Machine Translation of Rare Words with Subword Units , 2015, ACL.

[30]  Deyi Xiong,et al.  A Context-Aware Recurrent Encoder for Neural Machine Translation , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[31]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[32]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[33]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[34]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[35]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[36]  Christopher Potts,et al.  A large annotated corpus for learning natural language inference , 2015, EMNLP.

[37]  Regina Barzilay,et al.  Deriving Neural Architectures from Sequence and Graph Kernels , 2017, ICML.

[38]  George Kurian,et al.  Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.

[39]  Richard Socher,et al.  Quasi-Recurrent Neural Networks , 2016, ICLR.

[40]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[41]  Jian Zhang,et al.  SQuAD: 100,000+ Questions for Machine Comprehension of Text , 2016, EMNLP.