Label-Attentive Hierarchical Attention Network for Text Classification

The performance of text classification largely hinges on features extraction from text through representation learning. Recently, many deep learning models have been proved to be the state-of-the-art method for text classification. However, those approaches always ignore exploiting global clues, such as label information, which result in partial semantic loss. This paper proposes a text classification framework called Label-Attentive Hierarchical Attention Network (LAHAN). Firstly, to exploit global clues better and avoid semantic loss between text and label information, LAHAN generates label-attentive embedding by introducing the joint features of text words and labels information. Then a hierarchical architecture is adopted to utilize the label information on both word level and sentence level, which brings improvement to extract a better hierarchical text representation. Finally, the classification layer of our model predicts the category based on the full-text label-attentive representations. Experimental results on the several benchmark datasets demonstrate that our proposed method achieves improvement on accuracy compared to other state-of-the-art baselines.

[1]  Bowen Zhou,et al.  ABCNN: Attention-Based Convolutional Neural Network for Modeling Sentence Pairs , 2015, TACL.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[4]  Noriaki Kawamae Topic Structure-Aware Neural Language Model: Unified language model that maintains word and topic ordering by their embedded representations , 2019, WWW.

[5]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[6]  Jürgen Schmidhuber,et al.  Framewise phoneme classification with bidirectional LSTM and other neural network architectures , 2005, Neural Networks.

[7]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[8]  Jakob Uszkoreit,et al.  A Decomposable Attention Model for Natural Language Inference , 2016, EMNLP.

[9]  Qiaozhu Mei,et al.  PTE: Predictive Text Embedding through Large-scale Heterogeneous Text Networks , 2015, KDD.

[10]  Yann Dauphin,et al.  Convolutional Sequence to Sequence Learning , 2017, ICML.

[11]  Susan T. Dumais,et al.  A Bayesian Approach to Filtering Junk E-Mail , 1998, AAAI 1998.

[12]  Diyi Yang,et al.  Hierarchical Attention Networks for Document Classification , 2016, NAACL.

[13]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.

[14]  Sotiris Kotsiantis,et al.  Text Classification Using Machine Learning Techniques , 2005 .

[15]  Yaohui Jin,et al.  Multi-Task Label Embedding for Text Classification , 2017, EMNLP.

[16]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[17]  Richard Socher,et al.  Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[18]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[19]  Daling Wang,et al.  Context-Aware Chinese Microblog Sentiment Classification with Bidirectional LSTM , 2016, APWeb.

[20]  Gerhard Weikum,et al.  Fast logistic regression for text categorization with variable-length n-grams , 2008, KDD.

[21]  Erik Cambria,et al.  Recent Trends in Deep Learning Based Natural Language Processing , 2017, IEEE Comput. Intell. Mag..

[22]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[23]  SchmidhuberJürgen,et al.  2005 Special Issue , 2005 .

[24]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.