A Window-Based Self-Attention approach for sentence encoding

Abstract Recently, much progress has been made in the representation of sentences for the task of natural language processing (NLP). Most of existing methods utilize deep learning methods with RNN/CNN to capture contextual information. These models typically treat each word in a sentence equally, which ignores the fact that keywords always play a leading role in expressing the sentence semantic. Especially, on the tasks of sentence classification and semantic relatedness prediction, we can judge through the information of several keywords, no need for the whole sentence. To this end, we propose a window-based intra-weighing approach to weigh words in the sentence. In calculating attentive weights, we take as input multi-window n-grams and use max-pooling for feature extraction. To evaluate our model, we conduct experiments on 6 benchmark data sets. The experimental results demonstrate the superiority of our model in calculating word importance. Although our model has fewer parameters and much lower computational complexity than state-of-the-art models, it achieves comparable results with them.

[1]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[2]  William Yang Wang,et al.  Learning to Explain Non-Standard English Words and Phrases , 2017, IJCNLP.

[3]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[4]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[5]  Quoc V. Le,et al.  Grounded Compositional Semantics for Finding and Describing Images with Sentences , 2014, TACL.

[6]  William Yang Wang,et al.  Deep Residual Learning for Weakly-Supervised Relation Extraction , 2017, EMNLP.

[7]  Yoshua Bengio,et al.  On the Properties of Neural Machine Translation: Encoder–Decoder Approaches , 2014, SSST@EMNLP.

[8]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[9]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[10]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[11]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[12]  Xuanjing Huang,et al.  Adaptive Semantic Compositionality for Sentence Modelling , 2017, IJCAI.

[13]  Jiajun Zhang,et al.  Learning Sentence Representation with Guidance of Human Attention , 2016, IJCAI.

[14]  Jason Weston,et al.  Natural Language Processing (Almost) from Scratch , 2011, J. Mach. Learn. Res..

[15]  Man Lan,et al.  ECNU: One Stone Two Birds: Ensemble of Heterogenous Measures for Semantic Relatedness and Textual Entailment , 2014, *SEMEVAL.

[16]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[17]  Marco Marelli,et al.  A SICK cure for the evaluation of compositional distributional semantic models , 2014, LREC.

[18]  Xuanjing Huang,et al.  Dynamic Compositional Neural Networks over Tree Structure , 2017, IJCAI.

[19]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[20]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[21]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[22]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[23]  Xuanjing Huang,et al.  Multi-Timescale Long Short-Term Memory Neural Network for Modelling Sentences and Documents , 2015, EMNLP.

[24]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[25]  Quoc V. Le,et al.  Massive Exploration of Neural Machine Translation Architectures , 2017, EMNLP.

[26]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[27]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[28]  Zhi-Hong Deng,et al.  Inter-Weighted Alignment Network for Sentence Pair Modeling , 2017, EMNLP.

[29]  Han Zhao,et al.  Self-Adaptive Hierarchical Sentence Model , 2015, IJCAI.

[30]  Tao Shen,et al.  DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding , 2017, AAAI.

[31]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[32]  Quoc V. Le,et al.  Semi-supervised Sequence Learning , 2015, NIPS.

[33]  Malvina Nissim,et al.  The Meaning Factory: Formal Semantics for Recognizing Textual Entailment and Determining Semantic Similarity , 2014, *SEMEVAL.

[34]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[35]  Hal Daumé,et al.  Deep Unordered Composition Rivals Syntactic Methods for Text Classification , 2015, ACL.