Dynamic Global-Local Attention Network Based On Capsules for Text Classification

Text classification requires a comprehensive consideration of global and local information for the text. However, most methods only treat the global and local features of the text as two separate parts and ignore the relationship between them. In this paper, we propose a Dynamic Global-Local Attention Network based on Capsules (DGLA) that can use global features to dynamically adjust the importance of local features (e.g., sentence-level features or phrase-level features). The global features of the text are extracted by the capsule network, which can capture the mutual positional relationship of the input features to mine more hidden information. Furthermore, we have designed two global-local attention mechanisms within DGLA to measure the importance of two different local features and effectively leverage the advantages of these two attention mechanisms through the residual network. The performance of the model was evaluated on seven benchmark text classification datasets, and DGLA achieved the highest accuracy on all datasets. Ablation experiments show that the global-local attention mechanism can significantly improve the performance of the model.

[1]  Peng Zhou,et al.  Text Classification Improved by Integrating Bidirectional LSTM with Two-dimensional Max Pooling , 2016, COLING.

[2]  Bo Pang,et al.  A Sentimental Education: Sentiment Analysis Using Subjectivity Summarization Based on Minimum Cuts , 2004, ACL.

[3]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[4]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[5]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[6]  Yiming Yang,et al.  XLNet: Generalized Autoregressive Pretraining for Language Understanding , 2019, NeurIPS.

[7]  Fabien L. Gandon,et al.  Proceedings of the 2018 World Wide Web Conference , 2018 .

[8]  Xiaoyan Zhu,et al.  Linguistically Regularized LSTMs for Sentiment Classification , 2016, ArXiv.

[9]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[10]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[11]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[12]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[13]  SangKeun Lee,et al.  Dynamic Self-Attention : Computing Attention over Words Dynamically for Sentence Embedding , 2018, ArXiv.

[14]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[15]  Xiaoyan Zhu,et al.  Sentiment Analysis by Capsules , 2018, WWW.

[16]  Geoffrey E. Hinton,et al.  Dynamic Routing Between Capsules , 2017, NIPS.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  Min Yang,et al.  Investigating Capsule Networks with Dynamic Routing for Text Classification , 2018, EMNLP.

[19]  Bo Pang,et al.  Seeing Stars: Exploiting Class Relationships for Sentiment Categorization with Respect to Rating Scales , 2005, ACL.

[20]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[21]  Yann LeCun,et al.  Very Deep Convolutional Networks for Natural Language Processing , 2016, ArXiv.

[22]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[23]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[24]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[25]  Bing Liu,et al.  Mining and summarizing customer reviews , 2004, KDD.

[26]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[27]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.