EEMC: Embedding Enhanced Multi-tag Classification

The recently occurred representation learning make an attractive performance in NLP and complex network, it is becoming a fundamental technology in machine learning and data mining. How to use representation learning to improve the performance of classifiers is a very significance research direction. We using representation learning technology to map raw data(node of graph) to a low-dimensional feature space. In this space, each raw data obtained a lower dimensional vector representation, we do some simple linear operations for those vectors to produce some virtual data, using those vectors and virtual data to training multi-tag classifier. After that we measured the performance of classifier by F1 score(Macro% F1 and Micro% F1). Our method make Macro F1 rise from 28 % - 450% and make average F1 score rise from 12 % - 224%. By contrast, we trained the classifier directly with the lower dimensional vector, and measured the performance of classifiers. We validate our algorithm on three public data sets, we found that the virtual data helped the classifier greatly improve the F1 score. Therefore, our algorithm is a effective way to improve the performance of classifier. These result suggest that the virtual data generated by simple linear operation, in representation space, still retains the information of the raw data. It's also have great significance to the learning of small sample data sets.

[1]  Ming Yang,et al.  A Survey of Multi-View Representation Learning , 2019, IEEE Transactions on Knowledge and Data Engineering.

[2]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[3]  Steven Skiena,et al.  Polyglot: Distributed Word Representations for Multilingual NLP , 2013, CoNLL.

[4]  Qiongkai Xu,et al.  GraRep: Learning Graph Representations with Global Structural Information , 2015, CIKM.

[5]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[6]  Jian Pei,et al.  Asymmetric Transitivity Preserving Graph Embedding , 2016, KDD.

[7]  Nathaniel E. Helwig,et al.  An Introduction to Linear Algebra , 2006 .

[8]  Ji Zhang,et al.  Improved Gaussian-Bernoulli restricted Boltzmann machine for learning discriminative representations , 2019, Knowl. Based Syst..

[9]  Kebin Jia,et al.  Wave2Vec: Deep representation learning for clinical temporal data , 2019, Neurocomputing.

[10]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[11]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[12]  Alexander J. Smola,et al.  Distributed large-scale natural graph factorization , 2013, WWW.

[13]  Grigorios Tsoumakas,et al.  Multi-Label Classification: An Overview , 2007, Int. J. Data Warehous. Min..

[14]  Michael Kampffmeyer,et al.  Learning representations of multivariate time series with missing data , 2019, Pattern Recognit..

[15]  Estevan Vilar Word Embedding, Neural Networks and Text Classification: What is the State-of-the-Art? , 2019 .

[16]  Yunde Jia,et al.  Learning a Robust Representation via a Deep Network on Symmetric Positive Definite Manifolds , 2017, Pattern Recognit..

[17]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[18]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[19]  Chaoyang Zhang,et al.  Deep learning architectures for multi-label classification of intelligent health risk prediction , 2017, BMC Bioinformatics.

[20]  Yue Cao,et al.  Transferable Representation Learning with Deep Adaptation Networks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.