Stacked Residual Recurrent Neural Networks With Cross-Layer Attention for Text Classification

Text classification is a fundamental task in natural language processing and is essential for many tasks like sentiment analysis and question classification etc. As we all know, different NLP tasks require different linguistic features. Tasks such as text classification requires more semantic features than other tasks such as dependency parsing requiring more syntactic features. Most existing methods focus on improving performance by mixing and calibrating features, without distinguishing the types of features and corresponding effects. In this paper, we propose a stacked residual recurrent neural networks with cross-layer attention model to filter more semantic features for text classification, which named SRCLA. Firstly, we build a stacked network structure to filter different types of linguistic features, and then propose a novel cross-layer attention mechanism that exploits higher-level features to supervise the lower-level features to refine the filtering process. Based on this, more semantic features can be selected for text classification. We conduct experiments on eight text classification tasks, including sentiment analysis, question classification and subjectivity classification and compare with a broad range of baselines. Experimental results show that the proposed approaches achieve the state-of-the-art results on 5 out of 8 tasks.

[1]  Tao Shen,et al.  DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding , 2017, AAAI.

[2]  Zhiyuan Liu,et al.  Neural Sentiment Classification with User and Product Attention , 2016, EMNLP.

[3]  Yue Zhang,et al.  Context-Sensitive Lexicon Features for Neural Sentiment Analysis , 2016, EMNLP.

[4]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[5]  Xuanjing Huang,et al.  Long Short-Term Memory with Dynamic Skip Connections , 2018, AAAI.

[6]  Peng Zhou,et al.  Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling , 2016, COLING.

[7]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[8]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[9]  Yoshua Bengio,et al.  Gradient Flow in Recurrent Nets: the Difficulty of Learning Long-Term Dependencies , 2001 .

[10]  Han Zhao,et al.  Self-Adaptive Hierarchical Sentence Model , 2015, IJCAI.

[11]  Haoran Xie,et al.  Siamese Network-Based Supervised Topic Modeling , 2018, EMNLP.

[12]  Yoshimasa Tsuruoka,et al.  A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks , 2016, EMNLP.

[13]  Yoshua Bengio,et al.  Scaling learning algorithms towards AI , 2007 .

[14]  Claire Cardie,et al.  Deep Recursive Neural Networks for Compositionality in Language , 2014, NIPS.

[15]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[16]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[17]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[18]  Jürgen Schmidhuber,et al.  Highway Networks , 2015, ArXiv.

[19]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[20]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[21]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[22]  Franco Scarselli,et al.  On the Complexity of Neural Network Classifiers: A Comparison Between Shallow and Deep Architectures , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[23]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[24]  Xiaoyan Zhu,et al.  Linguistically Regularized LSTM for Sentiment Classification , 2016, ACL.

[25]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[26]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[27]  Raymond Y. K. Lau,et al.  Bootstrapping Social Emotion Classification with Semantically Rich Hybrid Neural Networks , 2017, IEEE Transactions on Affective Computing.

[28]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[29]  Christopher Potts,et al.  Learning Word Vectors for Sentiment Analysis , 2011, ACL.

[30]  Haoran Xie,et al.  A Network Framework for Noisy Label Aggregation in Social Media , 2017, ACL.

[31]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[32]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[33]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[35]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[36]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[37]  Zhi Jin,et al.  Discriminative Neural Sentence Modeling by Tree-Based Convolution , 2015, EMNLP.

[38]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[39]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[40]  Anders Søgaard,et al.  Deep multi-task learning with low level tasks supervised at lower layers , 2016, ACL.

[41]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[42]  Chu-Ren Huang,et al.  A Cognition Based Attention Model for Sentiment Analysis , 2017, EMNLP.

[43]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..

[44]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[45]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[46]  Christopher D. Manning,et al.  Baselines and Bigrams: Simple, Good Sentiment and Topic Classification , 2012, ACL.