A Label-Specific Attention-Based Network with Regularized Loss for Multi-label Classification

In a multi-label text classification task, different parts of a document do not contribute equally to predicting labels. Most existing approaches failed to consider this problem. Several methods have been proposed to take this problem into account. However, they just utilized hidden representations of neural networks as input of attention mechanism, not combining with label information. In this work, we propose an improved attention-based neural network model for multi-label text classification, which can obtain the weights of attention mechanism by computing the similarity between each label and each word of documents. This model adds the label information into text representations which can select the most informative words accurately for predicting labels. Besides, compared with single-label classification, the labels of multi-label classification may have some correlations such as co-occurrence or conditional probability relationship. So we also propose a special regularization term for this model, which can help to exploit label correlations by using label co-occurrence matrix. Experimental results on AAPD and RCV1-V2 datasets demonstrate that the proposed model yields a significant performance gain compared to many state-of-the-art approaches.

[1]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Andrew McCallum,et al.  Collective multi-label classification , 2005, CIKM '05.

[4]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[5]  Bowen Zhou,et al.  Improved Neural Network-based Multi-label Classification with Better Initialization Leveraging Label Co-occurrence , 2016, NAACL.

[6]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[7]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[8]  Eyke Hüllermeier,et al.  Multilabel classification via calibrated label ranking , 2008, Machine Learning.

[9]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[10]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[11]  Wei Wu,et al.  SGM: Sequence Generation Model for Multi-label Classification , 2018, COLING.

[12]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[13]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[14]  Johannes Fürnkranz,et al.  Large-Scale Multi-label Text Classification - Revisiting Neural Networks , 2013, ECML/PKDD.

[15]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[16]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[17]  Mark Sanderson,et al.  Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press 2008. ISBN-13 978-0-521-86571-5, xxi + 482 pages , 2010, Natural Language Engineering.

[18]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[19]  Zhenchang Xing,et al.  Ensemble application of convolutional and recurrent neural networks for multi-label text categorization , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[20]  Wei Shi,et al.  Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification , 2016, ACL.