Deep Convolution Neural Network for Extreme Multi-label Text Classification

In this paper we present an analysis on the usage of Deep Neural Networks for extreme multi-label and multiclass text classification. We will consider two network models: the first one is formed by a word embeddings (WEs) stage followed by two dense layers, hereinafter Dense, and a second model with a convolution stage between the WEs and the dense layers, hereinafter CNN-Dense. We will take into account classification problems characterized by different number of labels, ranging from an order of 10 to an order of 30,000, showing the different performances of the neural networks varying the total label number and the average number of labels for sample, exploiting the hierarchical structure of the label space of the dataset used for experimental assessment. It is worth noting that multi-label classification is an harder problem if compared to multi-class, due to the variable number of labels associated to each sample. We will even investigate on the behaviour of the neural networks as function of the training hyperparameters, analysing the link between them and the dataset complexity. All the result will be evaluated using the PubMed scientific articles collection as

[1]  Boris Polyak Some methods of speeding up the convergence of iteration methods , 1964 .

[2]  Vili Podgorelec,et al.  Text classification method based on self-training and LDA topic models , 2017, Expert Syst. Appl..

[3]  Anita Alicante,et al.  A distributed architecture to integrate ontological knowledge into information extraction , 2016, Int. J. Grid Util. Comput..

[4]  Johannes Fürnkranz,et al.  Large-Scale Multi-label Text Classification - Revisiting Neural Networks , 2013, ECML/PKDD.

[5]  Yann LeCun,et al.  Very Deep Convolutional Networks for Text Classification , 2016, EACL.

[6]  Petr Sojka,et al.  Software Framework for Topic Modelling with Large Corpora , 2010 .

[7]  Y. Nesterov A method for unconstrained convex minimization problem with the rate of convergence o(1/k^2) , 1983 .

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  Zhenchang Xing,et al.  Ensemble application of convolutional and recurrent neural networks for multi-label text categorization , 2017, 2017 International Joint Conference on Neural Networks (IJCNN).

[10]  Priyanka Nigam,et al.  Applying Deep Learning to ICD-9 Multi-label Classification from Medical Records , 2016 .

[11]  Anita Alicante,et al.  Semantic Cluster Labeling for Medical Relations , 2016 .

[12]  Rui Wang,et al.  Using Word Embeddings to Enhance Keyword Identification for Scientific Publications , 2015, ADC.

[13]  Giuseppe De Pietro,et al.  A Deep Learning Approach for Scientific Paper Semantic Ranking , 2018, IIMSS.

[14]  Levent Özgür,et al.  Text Categorization with Class-Based and Corpus-Based Keyword Selection , 2005, ISCIS.

[15]  Wang Ling,et al.  Generative and Discriminative Text Classification with Recurrent Neural Networks , 2017, ArXiv.

[16]  Taimoor Akhtar,et al.  Efficient Hyperparameter Optimization for Deep Learning Algorithms Using Deterministic RBF Surrogates , 2016, AAAI.

[17]  Salvatore Venticinque,et al.  Personalized Recommendation of Semantically Annotated Media Contents , 2013, IDC.

[18]  Spyros Kotoulas,et al.  Medical Text Classification using Convolutional Neural Networks , 2017, Studies in health technology and informatics.

[19]  Nenghai Yu,et al.  Learning Spatial Regularization with Image-Level Supervisions for Multi-label Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Mario Ciampi,et al.  A Big Data architecture for knowledge discovery in PubMed articles , 2017, 2017 IEEE Symposium on Computers and Communications (ISCC).

[21]  Yiming Yang,et al.  Deep Learning for Extreme Multi-label Text Classification , 2017, SIGIR.

[22]  Yongjun Zhang,et al.  LF-LDA: A Topic Model for Multi-label Classification , 2017, EIDWT.

[23]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[24]  Fei Tian,et al.  Recurrent Residual Learning for Sequence Classification , 2016, EMNLP.

[25]  Geoffrey E. Hinton,et al.  On the importance of initialization and momentum in deep learning , 2013, ICML.

[26]  Qingquan Huang,et al.  Attribute Extracting from Wikipedia Pages in Domain Automatically , 2015, ITITS.

[27]  Xindong Wu,et al.  Topic Modeling over Short Texts by Incorporating Word Embeddings , 2016, PAKDD.

[28]  Ying Wang,et al.  LSTM$$^{2}$$2: Multi-Label Ranking for Document Classification , 2017, Neural Processing Letters.

[29]  Noémie Elhadad,et al.  Multi-Label Classification of Patient Notes: Case Study on ICD Code Assignment , 2018, AAAI Workshops.

[30]  Mark J. Berger,et al.  Large Scale Multi-label Text Classification with Semantic Word Vectors , 2015 .

[31]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Hongyuan Zha,et al.  Deep Extreme Multi-label Learning , 2017, ICMR.

[33]  Anna Korhonen,et al.  Initializing neural networks for hierarchical multi-label text classification , 2017, BioNLP.