Global-Local Mutual Attention Model for Text Classification

Text classification is a central field of inquiry in natural language processing (NLP). Although some models learn local semantic features and global long-term dependencies simultaneously, they simply combine them through concatenation either in a cascade way or in parallel while mutual effects between them are ignored. In this paper, we propose the Global-Local Mutual Attention (GLMA) model for text classification problems, which introduces a mutual attention mechanism for mutual learning between local semantic features and global long-term dependencies. The mutual attention mechanism consists of a Local-Guided Global-Attention (LGGA) and a Global-Guided Local-Attention (GGLA). The LGGA allows to assign weights and combine global long-term dependencies of word positions that are semantic related. It captures combined semantics and alleviates the gradient vanishing problem. The GGLA automatically assigns more weights to relevant local semantic features, which captures key local semantic information and filters both noises and irrelevant words/phrases. Furthermore, a weighted-over-time pooling operation is developed to aggregate the most informative and discriminative features for classification. Extensive experiments demonstrate that our model obtains the state-of-the-art performance on seven benchmark datasets and sixteen Amazon product reviews datasets. Both the result analysis and the mutual attention weights visualization further demonstrate the effectiveness of the proposed model.

[1]  Hwee Tou Ng,et al.  Corpus-Based Approaches to Semantic Interpretation in Natural Language Processing , 1997 .

[2]  Dell Zhang,et al.  Question classification using support vector machines , 2003, SIGIR.

[3]  Tong Zhang,et al.  Deep Pyramid Convolutional Neural Networks for Text Categorization , 2017, ACL.

[4]  Hongxia Yang,et al.  A Hybrid Framework for Text Modeling with Convolutional RNN , 2017, KDD.

[5]  Ting Liu,et al.  Document Modeling with Gated Recurrent Neural Network for Sentiment Classification , 2015, EMNLP.

[6]  David J. Hess,et al.  Effects of global and local context on lexical processing during language comprehension , 1995 .

[7]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[8]  Baoxin Wang,et al.  Disconnected Recurrent Neural Networks for Text Categorization , 2018, ACL.

[9]  Wenpeng Yin,et al.  Comparative Study of CNN and RNN for Natural Language Processing , 2017, ArXiv.

[10]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[11]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[12]  Christopher Potts,et al.  A Fast Unified Model for Parsing and Sentence Understanding , 2016, ACL.

[13]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[14]  Yang Zhang,et al.  Adaptive Learning of Local Semantic and Global Structure Representations for Text Classification , 2018, COLING.

[15]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[16]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[17]  Jiasen Lu,et al.  Hierarchical Question-Image Co-Attention for Visual Question Answering , 2016, NIPS.

[18]  Ming Zhou,et al.  A Joint Segmentation and Classification Framework for Sentence Level Sentiment Classification , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[19]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[20]  Xiaoyong Du,et al.  Initializing Convolutional Filters with Semantic Features for Text Classification , 2017, EMNLP.

[21]  Rabab Kreidieh Ward,et al.  Deep Sentence Embedding Using Long Short-Term Memory Networks: Analysis and Application to Information Retrieval , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[22]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[23]  Zhidong Deng,et al.  Densely Connected CNN with Multi-scale Feature Attention for Text Classification , 2018, IJCAI.

[24]  Charu C. Aggarwal,et al.  A Survey of Text Classification Algorithms , 2012, Mining Text Data.

[25]  Tao Mei,et al.  Multi-level Attention Networks for Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Li Zhao,et al.  Learning Structured Representation for Text Classification via Reinforcement Learning , 2018, AAAI.

[27]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[28]  Rui Zhang,et al.  Dependency Sensitive Convolutional Neural Networks for Modeling Sentences and Documents , 2016, NAACL.

[29]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[30]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[31]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[32]  John Blitzer,et al.  Biographies, Bollywood, Boom-boxes and Blenders: Domain Adaptation for Sentiment Classification , 2007, ACL.

[33]  Claire Cardie,et al.  Annotating Expressions of Opinions and Emotions in Language , 2005, Lang. Resour. Evaluation.

[34]  Zhiyong Luo,et al.  Combination of Convolutional and Recurrent Neural Network for Sentiment Analysis of Short Texts , 2016, COLING.

[35]  Christopher D. Manning,et al.  Effective Approaches to Attention-based Neural Machine Translation , 2015, EMNLP.

[36]  Zhiyuan Liu,et al.  A C-LSTM Neural Network for Text Classification , 2015, ArXiv.

[37]  Fei Tian,et al.  Recurrent Residual Learning for Sequence Classification , 2016, EMNLP.

[38]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[39]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[40]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[41]  Zehra Cataltepe,et al.  Co-training with relevant random subspaces , 2010, Neurocomputing.

[42]  K. Robert Lai,et al.  Dimensional Sentiment Analysis Using a Regional CNN-LSTM Model , 2016, ACL.

[43]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[44]  Yue Zhang,et al.  Sentence-State LSTM for Text Representation , 2018, ACL.

[45]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[46]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[47]  Quoc V. Le,et al.  Massive Exploration of Neural Machine Translation Architectures , 2017, EMNLP.

[48]  Xuanjing Huang,et al.  Cached Long Short-Term Memory Neural Networks for Document-Level Sentiment Classification , 2016, EMNLP.

[49]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[50]  Jun Fu,et al.  Dual Attention Network for Scene Segmentation , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[51]  Sutanu Chakraborti,et al.  Document classification by topic labeling , 2013, SIGIR.

[52]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.