Self-Attention Enhanced Recurrent Neural Networks for Sentence Classification

In this paper we propose self-attention enhanced Recurrent Neural Networks for the task of sentence classification. The proposed framework is based on Vanilla Recurrent Neural Network and Bi-directional Recurrent Neural Network architecture. These architectures have been implemented over two different recurrent cells namely Long Short-Term Memory and Gated Recurrent Unit. We have used the multi-head self-attention mechanism to improve the feature selection and thus preserve dependency over longer lengths in the recurrent neural network architectures. Further, to ensure better context development, we have used Mikolov’s pre-trained word2vec word vectors in both the static and non-static mode. To check the efficacy of our proposed framework, we have made a comparison of our models with the state-of-the-art methods of Yoon Kim on seven benchmark datasets. The proposed framework achieves a state-of-the-art result on four of the seven datasets and a performance gain over the baseline model on five of the seven datasets. Furthermore, to check the effectivity of self-attention on the task of sentence classification, we compare our self-attention based framework with Bahdanau’s attention based implementation from our previous work.

[1]  Riitta Pirinen The Construction of Women’s Positions in Sport: A Textual Analysis of Articles on Female Athletes in Finnish Women’s Magazines , 1997 .

[2]  Choochart Haruechaiyasak,et al.  Effectiveness of social media text classification by utilizing the online news category , 2015, 2015 2nd International Conference on Advanced Informatics: Concepts, Theory and Applications (ICAICTA).

[3]  Joseph Chee Chang,et al.  Recurrent-Neural-Network for Language Detection on Twitter Code-Switching Corpus , 2014, ArXiv.

[4]  Spyros Kotoulas,et al.  Medical Text Classification using Convolutional Neural Networks , 2017, Studies in health technology and informatics.

[5]  Bing Liu,et al.  Attention-Based Recurrent Neural Network Models for Joint Intent Detection and Slot Filling , 2016, INTERSPEECH.

[6]  Hsinchun Chen,et al.  Textual analysis of stock market prediction using breaking financial news: The AZFin text system , 2009, TOIS.

[7]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[8]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[9]  Sepp Hochreiter,et al.  The Vanishing Gradient Problem During Learning Recurrent Neural Nets and Problem Solutions , 1998, Int. J. Uncertain. Fuzziness Knowl. Based Syst..

[10]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[11]  Geoffrey Zweig,et al.  Using Recurrent Neural Networks for Slot Filling in Spoken Language Understanding , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[12]  Reshma Rastogi,et al.  Attentional Recurrent Neural Networks for Sentence Classification , 2019 .

[13]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[14]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[15]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[16]  Garen Arevian Recurrent Neural Networks for Robust Real-World Text Classification , 2007 .

[17]  Quoc V. Le,et al.  Sequence to Sequence Learning with Neural Networks , 2014, NIPS.

[18]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[19]  Geoffrey E. Hinton,et al.  Speech recognition with deep recurrent neural networks , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[21]  Pierre Baldi,et al.  Improving the prediction of protein secondary structure in three and eight classes using recurrent neural networks and profiles , 2002, Proteins.

[22]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[23]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[24]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[25]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[26]  K. Robert Lai,et al.  Dimensional Sentiment Analysis Using a Regional CNN-LSTM Model , 2016, ACL.

[27]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[28]  Samy Bengio,et al.  Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[30]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[31]  Jun Li,et al.  Short Text Emotion Analysis Based on Recurrent Neural Network , 2017, ICIE '17.

[32]  Claire Cardie,et al.  Opinion Mining with Deep Recurrent Neural Networks , 2014, EMNLP.

[33]  Dan Roth,et al.  Learning question classifiers: the role of semantic information , 2005, Natural Language Engineering.

[34]  Ting Liu,et al.  Document Modeling with Gated Recurrent Neural Network for Sentiment Classification , 2015, EMNLP.