MODE-LSTM: A Parameter-efficient Recurrent Network with Multi-Scale for Sentence Classification

The central problem of sentence classification is to extract multi-scale n-gram features for understanding the semantic meaning of sentences. Most existing models tackle this problem by stacking CNN and RNN models, which easily leads to feature redundancy and overfitting because of relatively limited datasets. In this paper, we propose a simple yet effective model called Multi-scale Orthogonal inDependEnt LSTM (MODE-LSTM), which not only has effective parameters and good generalization ability, but also considers multiscale n-gram features. We disentangle the hidden state of the LSTM into several independently updated small hidden states and apply an orthogonal constraint on their recurrent matrices. We then equip this structure with sliding windows of different sizes for extracting multi-scale n-gram features. Extensive experiments demonstrate that our model achieves better or competitive performance against state-of-the-art baselines on eight benchmark datasets. We also combine our model with BERT to further boost the generalization performance.

[1]  Xuanjing Huang,et al.  Information Aggregation via Dynamic Routing for Sequence Encoding , 2018, COLING.

[2]  Xiang Li,et al.  Densely Connected Bidirectional LSTM with Applications to Sentence Classification , 2018, NLPCC.

[3]  Angus Roberts,et al.  A Deep Neural Network Sentence Level Classification Method with Context Information , 2018, EMNLP.

[4]  Wenpeng Yin,et al.  Multichannel Variable-Size Convolution for Sentence Classification , 2015, CoNLL.

[5]  Yangyang Shi,et al.  Deep LSTM based Feature Mapping for Query Classification , 2016, NAACL.

[6]  Ye Zhang,et al.  MGNC-CNN: A Simple Approach to Exploiting Multiple Word Embeddings for Sentence Classification , 2016, NAACL.

[7]  K. Robert Lai,et al.  Investigating Dynamic Routing in Tree-Structured LSTM for Sentiment Analysis , 2019, EMNLP.

[8]  Lidia S. Chao,et al.  Leveraging Local and Global Patterns for Self-Attention Networks , 2019, ACL.

[9]  Nan Hua,et al.  Universal Sentence Encoder , 2018, ArXiv.

[10]  Wenpeng Yin,et al.  Comparative Study of CNN and RNN for Natural Language Processing , 2017, ArXiv.

[11]  Hongyu Guo,et al.  Augmenting Data with Mixup for Sentence Classification: An Empirical Study , 2019, ArXiv.

[12]  Preslav Nakov,et al.  Rotational Unit of Memory: A Novel Representation Unit for RNNs with Scalable Applications , 2019, TACL.

[13]  Holger Schwenk,et al.  Supervised Learning of Universal Sentence Representations from Natural Language Inference Data , 2017, EMNLP.

[14]  Christian S. Perone,et al.  Evaluation of sentence embeddings in downstream and linguistic probing tasks , 2018, ArXiv.

[15]  Fei Tian,et al.  Recurrent Residual Learning for Sequence Classification , 2016, EMNLP.

[16]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[17]  Yong Zhang,et al.  Attention pooling-based convolutional neural network for sentence modelling , 2016, Inf. Sci..

[18]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[19]  Xiaojie Wang,et al.  Differentiated Attentive Representation Learning for Sentence Classification , 2018, IJCAI.

[20]  Xuanjing Huang,et al.  Dynamic Compositional Neural Networks over Tree Structure , 2017, IJCAI.

[21]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[22]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[23]  Shuai Wang,et al.  Target-Sensitive Memory Networks for Aspect Sentiment Classification , 2018, ACL.

[24]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[25]  Yaohui Jin,et al.  Transformable Convolutional Neural Network for Text Classification , 2018, IJCAI.

[26]  Rui Zhang,et al.  Dependency Sensitive Convolutional Neural Networks for Modeling Sentences and Documents , 2016, NAACL.

[27]  Tong Zhang,et al.  Modeling Localness for Self-Attention Networks , 2018, EMNLP.

[28]  Li Zhao,et al.  Learning Structured Representation for Text Classification via Reinforcement Learning , 2018, AAAI.

[29]  Yang Zhang,et al.  Adaptive Learning of Local Semantic and Global Structure Representations for Text Classification , 2018, COLING.

[30]  Xing Xie,et al.  DRr-Net: Dynamic Re-Read Network for Sentence Semantic Matching , 2019, AAAI.

[31]  Xinlei Chen,et al.  Visualizing and Understanding Neural Models in NLP , 2015, NAACL.

[32]  Zibin Zheng,et al.  Dynamically Route Hierarchical Structure Representation to Attentive Capsule for Text Classification , 2019, IJCAI.

[33]  Zhiyuan Liu,et al.  A C-LSTM Neural Network for Text Classification , 2015, ArXiv.

[34]  Boris Ginsburg,et al.  Factorization tricks for LSTM networks , 2017, ICLR.

[35]  Min Yang,et al.  Investigating Capsule Networks with Dynamic Routing for Text Classification , 2018, EMNLP.

[36]  Franck Dernoncourt,et al.  Sequential Short-Text Classification with Recurrent and Convolutional Neural Networks , 2016, NAACL.

[37]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[38]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[39]  K. Robert Lai,et al.  Dimensional Sentiment Analysis Using a Regional CNN-LSTM Model , 2016, ACL.

[40]  Zhaopeng Tu,et al.  Convolutional Self-Attention Networks , 2019, NAACL.

[41]  Avinash Madasu,et al.  Sequential Learning of Convolutional Features for Effective Text Classification , 2019, EMNLP.

[42]  Baoxin Wang,et al.  Disconnected Recurrent Neural Networks for Text Categorization , 2018, ACL.

[43]  Peng Wang,et al.  Semantic Clustering and Convolutional Neural Network for Short Text Categorization , 2015, ACL.

[44]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[45]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[46]  Zhiyong Luo,et al.  Combination of Convolutional and Recurrent Neural Network for Sentiment Analysis of Short Texts , 2016, COLING.

[47]  Regina Barzilay,et al.  Molding CNNs for text: non-linear, non-consecutive convolutions , 2015, EMNLP.