Improved sequence generation model for multi-label classification via CNN and initialized fully connection

Abstract In multi-label text classification, considering the correlation between labels is an important yet challenging task due to the combination possibility in the label space increasing exponentially. In recent years, neural network models have been widely applied and gradually achieved satisfactory performance in this field. However, existing methods either not model the fully internal correlations among labels or not capture the local and global semantic information of text simultaneously, which somewhat affects the classification results finally. In this paper, we implement a novel model for multi-label classification based on sequence-to-sequence learning, in which two different neural network modules are employed, named encoder and decoder respectively. The encoder uses the convolutional neural network to extract the high-level local sequential semantic, which is combined with the word vector to generate the final text representation through the recurrent neuron network and attention mechanism. The decoder, besides using a recurrent neural network to capture the global label correlation, employs an initialized fully connection layer to capture the correlation between any two different labels. When trained on RCV1-v2, AAPD and Ren-CECps datasets, the proposed model outperforms previous work in main evaluation metrics of hamming loss and micro-F1 score.

[1]  Qinghua Hu,et al.  Hybrid Noise-Oriented Multilabel Learning , 2020, IEEE Transactions on Cybernetics.

[2]  Mark Sanderson,et al.  Christopher D. Manning, Prabhakar Raghavan, Hinrich Schütze, Introduction to Information Retrieval, Cambridge University Press 2008. ISBN-13 978-0-521-86571-5, xxi + 482 pages , 2010, Natural Language Engineering.

[3]  Jason Weston,et al.  A kernel method for multi-labelled classification , 2001, NIPS.

[4]  Anna Korhonen,et al.  Initializing neural networks for hierarchical multi-label text classification , 2017, BioNLP.

[5]  Zhigang Zeng,et al.  Memristive LSTM Network for Sentiment Analysis , 2019, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[6]  Zhi-Hua Zhou,et al.  Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization , 2006, IEEE Transactions on Knowledge and Data Engineering.

[7]  Xu Sun,et al.  Semantic-Unit-Based Dilated Convolution for Multi-Label Text Classification , 2018, EMNLP.

[8]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[9]  Wei Liu,et al.  Multi-label Learning with Missing Labels Using Mixed Dependency Graphs , 2018, International Journal of Computer Vision.

[10]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[11]  Phil Blunsom,et al.  A Convolutional Neural Network for Modelling Sentences , 2014, ACL.

[12]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[13]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[14]  Yiming Yang,et al.  Multilabel classification with meta-level features , 2010, SIGIR.

[15]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[16]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[17]  Peng Wang,et al.  Semantic expansion using word embedding clustering and convolutional neural network for improving short text classification , 2016, Neurocomputing.

[18]  Bowen Zhou,et al.  Improved Neural Network-based Multi-label Classification with Better Initialization Leveraging Label Co-occurrence , 2016, NAACL.

[19]  Geoffrey E. Hinton,et al.  Rectified Linear Units Improve Restricted Boltzmann Machines , 2010, ICML.

[20]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[21]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[22]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[23]  Jun Zhao,et al.  Recurrent Convolutional Neural Networks for Text Classification , 2015, AAAI.

[24]  Erik Cambria,et al.  Affective Computing and Sentiment Analysis , 2016, IEEE Intelligent Systems.

[25]  Zhiyong Feng,et al.  LSTM with sentence representations for document-level sentiment classification , 2018, Neurocomputing.

[26]  Wei Wu,et al.  SGM: Sequence Generation Model for Multi-label Classification , 2018, COLING.

[27]  Amanda Clare,et al.  Knowledge Discovery in Multi-label Phenotype Data , 2001, PKDD.

[28]  Johannes Fürnkranz,et al.  Maximizing Subset Accuracy with Recurrent Neural Networks in Multi-label Classification , 2017, NIPS.

[29]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[30]  Zhenchang Xing,et al.  Ensemble application of convolutional and recurrent neural networks for multi-label text categorization , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).