Long Short-Term Memory with Dynamic Skip Connections

In recent years, long short-term memory (LSTM) has been successfully used to model sequential data of variable length. However, LSTM can still experience difficulty in capturing long-term dependencies. In this work, we tried to alleviate this problem by introducing a dynamic skip connection, which can learn to directly connect two dependent words. Since there is no dependency information in the training data, we propose a novel reinforcement learning-based method to model the dependency relationship and connect dependent words. The proposed model computes the recurrent transition functions based on the skip connections, which provides a dynamic skipping advantage over RNNs that always tackle entire sentences sequentially. Our experimental results on three natural language processing tasks demonstrate that the proposed method can achieve better performance than existing methods. In the number prediction experiment, the proposed model outperformed LSTM with respect to accuracy by nearly 20%.

[1]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[2]  Zhiyuan Liu,et al.  Neural Sentiment Classification with User and Product Attention , 2016, EMNLP.

[3]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[4]  Αλέξιος Γεωργίου Γιδιώτης Abstractive Text Summarization , 2020, Journal of Xidian University.

[5]  Eric Nichols,et al.  Named Entity Recognition with Bidirectional LSTM-CNNs , 2015, TACL.

[6]  Guillaume Lample,et al.  Neural Architectures for Named Entity Recognition , 2016, NAACL.

[7]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[8]  Yoshua Bengio,et al.  Hierarchical Multiscale Recurrent Neural Networks , 2016, ICLR.

[9]  Wei Xu,et al.  Bidirectional LSTM-CRF Models for Sequence Tagging , 2015, ArXiv.

[10]  Yoshua Bengio,et al.  Architectural Complexity Measures of Recurrent Neural Networks , 2016, NIPS.

[11]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[12]  Alexander M. Rush,et al.  Latent Alignment and Variational Attention , 2018, NeurIPS.

[13]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[14]  Richard Socher,et al.  Pointer Sentinel Mixture Models , 2016, ICLR.

[15]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[16]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[17]  Eduard H. Hovy,et al.  End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF , 2016, ACL.

[18]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[19]  J. Koenderink Q… , 2014, Les noms officiels des communes de Wallonie, de Bruxelles-Capitale et de la communaute germanophone.

[20]  Cícero Nogueira dos Santos,et al.  Learning Character-level Representations for Part-of-Speech Tagging , 2014, ICML.

[21]  Jordi Torres,et al.  Skip RNN: Learning to Skip State Updates in Recurrent Neural Networks , 2017, ICLR.

[22]  Yang Liu,et al.  Modeling Coverage for Neural Machine Translation , 2016, ACL.

[23]  Yoshua Bengio,et al.  Hierarchical Recurrent Neural Networks for Long-Term Dependencies , 1995, NIPS.

[24]  Geoffrey Zweig,et al.  Context dependent recurrent neural network language model , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[25]  Quoc V. Le,et al.  Learning to Skim Text , 2017, ACL.

[26]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[27]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[28]  Gholamreza Haffari,et al.  Incorporating Structural Alignment Biases into an Attentional Neural Translation Model , 2016, NAACL.

[29]  Razvan Pascanu,et al.  How to Construct Deep Recurrent Neural Networks , 2013, ICLR.

[30]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[31]  Thomas S. Huang,et al.  Dilated Recurrent Neural Networks , 2017, NIPS.

[32]  Ali Farhadi,et al.  Neural Speed Reading via Skim-RNN , 2017, ICLR.

[33]  Angelika Steger,et al.  Fast-Slow Recurrent Neural Networks , 2017, NIPS.

[34]  Bowen Zhou,et al.  Abstractive Text Summarization using Sequence-to-sequence RNNs and Beyond , 2016, CoNLL.

[35]  Wojciech Zaremba,et al.  Recurrent Neural Network Regularization , 2014, ArXiv.

[36]  Andrew McCallum,et al.  Fast and Accurate Entity Recognition with Iterated Dilated Convolutions , 2017, EMNLP.

[37]  Joelle Pineau,et al.  Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.

[38]  Chu-Ren Huang,et al.  A Cognition Based Attention Model for Sentiment Analysis , 2017, EMNLP.

[39]  Zoubin Ghahramani,et al.  A Theoretically Grounded Application of Dropout in Recurrent Neural Networks , 2015, NIPS.

[40]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[41]  Dale Schuurmans,et al.  Bridging the Gap Between Value and Policy Based Reinforcement Learning , 2017, NIPS.