Attention model with multi-layer supervision for text Classification

Text classification is a classic topic in natural language processing. In this study, we propose an attention model with multi-layer supervision for this task. In our model, the previous context vector is directly used as attention to select the required features, and multi-layer supervision is used for text classification, i.e., the prediction losses are combined across all layers in the global cost function. The main contribution of our model is that the context vector is not only used as attention but also as a representation of an input text for classification at each layer. We conducted experiments based on five benchmark text classification data sets and the results indicate that our model can improve classification performance when applied to most of the data sets.

[1]  Houfeng Wang,et al.  Interactive Attention Networks for Aspect-Level Sentiment Classification , 2017, IJCAI.

[2]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[3]  Hem Jyotsana Parashar,et al.  An Efficient Classification Approach for Data Mining , 2012 .

[4]  Jun Zhao,et al.  Inner Attention based Recurrent Neural Networks for Answer Selection , 2016, ACL.

[5]  Hongyu Guo,et al.  Long Short-Term Memory Over Recursive Structures , 2015, ICML.

[6]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[7]  Colin Raffel,et al.  Online and Linear-Time Attention by Enforcing Monotonic Alignments , 2017, ICML.

[8]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[9]  Tengfei Liu,et al.  Recurrent networks with attention and convolutional networks for sentence representation and classification , 2018, Applied Intelligence.

[10]  Vedat Tavsanoglu,et al.  Multiscale handwritten character recognition using CNN image filters , 2002, Proceedings of the 2002 International Joint Conference on Neural Networks. IJCNN'02 (Cat. No.02CH37290).

[11]  Eduardo Abreu,et al.  Nonminimum phase channel equalization using noncausal filters , 1997, IEEE Trans. Signal Process..

[12]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[13]  Fariborz Mahmoudi,et al.  From Text to Knowledge: Semantic Entity Extractionusing YAGO Ontology , 2011 .

[14]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[15]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[16]  Tao Shen,et al.  DiSAN: Directional Self-Attention Network for RNN/CNN-free Language Understanding , 2017, AAAI.

[17]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[18]  Xiao Sun,et al.  A New LSTM Network Model Combining TextCNN , 2018, ICONIP.

[19]  Phil Blunsom,et al.  Reasoning about Entailment with Neural Attention , 2015, ICLR.

[20]  Richard Socher,et al.  Dynamic Memory Networks for Visual and Textual Question Answering , 2016, ICML.

[21]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[22]  Wenpeng Yin,et al.  Multichannel Variable-Size Convolution for Sentence Classification , 2015, CoNLL.

[23]  Yoshua Bengio,et al.  Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation , 2014, EMNLP.

[24]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[25]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[26]  Jason Weston,et al.  End-To-End Memory Networks , 2015, NIPS.

[27]  Vlad Niculae,et al.  A Regularized Framework for Sparse and Structured Neural Attention , 2017, NIPS.

[28]  Ramón Fernández Astudillo,et al.  From Softmax to Sparsemax: A Sparse Model of Attention and Multi-Label Classification , 2016, ICML.

[29]  Xiang Zhang,et al.  Character-level Convolutional Networks for Text Classification , 2015, NIPS.

[30]  S. H. Gawande,et al.  A Comparative Study on Different Types of Approaches to Text Categorization , 2012 .

[31]  Andrei Popescu-Belis,et al.  Multilingual Hierarchical Attention Networks for Document Classification , 2017, IJCNLP.

[32]  Li Sheng,et al.  Semantic Role Labeling with Maximum Entropy Classifier , 2007 .

[33]  Haiming Wang,et al.  A new approach for edge detection of noisy image based on CNN , 2003, Int. J. Circuit Theory Appl..

[34]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[35]  Emmanuel Buabin,et al.  Boosted Hybrid Recurrent Neural Classifier for Text Document Classification on the Reuters News Text Corpus , 2012 .

[36]  Richard Socher,et al.  Ask Me Anything: Dynamic Memory Networks for Natural Language Processing , 2015, ICML.

[37]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[38]  Yoshua Bengio,et al.  Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling , 2014, ArXiv.