Semi-supervised Multi-label Learning for Graph-structured Data

The semi-supervised multi-label classification problem primarily deals with Euclidean data, such as text with a 1D grid of tokens and images with a 2D grid of pixels. However, the non-Euclidean graph-structured data naturally and constantly appears in semi-supervised multi-label learning tasks from various domains like social networks, citation networks, and protein-protein interaction (PPI) networks. Moreover, the existing popular node embedding methods, like Graph Neural Networks (GNN), focus on graphs with simplex labels and tend to neglect label correlations in the multi-label setting, so the easy adaption proves empirically ineffective. Therefore, graph representation learning for the semi-supervised multi-label learning task is crucial and challenging. In this work, we incorporate the idea of label embedding into our proposed model to capture both network topology and higher-order multi-label correlations. The label embedding is generated along with the node embedding based on the topological structure to serve as the prototype center for each class. Moreover, the similarity of the label embedding and node embedding can be used as a confidence vector to guide the label smoothing process, formulating as a margin ranking optimization problem to learn the second-order relations between labels. Extensive experiments on real-world datasets from various domains demonstrate that our model significantly outperforms the state-of-the-art models for node-level tasks.

[1]  Zenglin Xu,et al.  Graph-Based Semi-Supervised Learning: A Comprehensive Review , 2021, IEEE Transactions on Neural Networks and Learning Systems.

[2]  Jianxun Liu,et al.  Multi-Label Graph Convolutional Network Representation Learning , 2019, IEEE Transactions on Big Data.

[3]  Xu-Ying Liu,et al.  Towards Class-Imbalance Aware Multi-Label Learning , 2015, IEEE Transactions on Cybernetics.

[4]  Irwin King,et al.  Discrete-time Temporal Network Embedding via Implicit Hierarchical Learning in Hyperbolic Space , 2021, Knowledge Discovery and Data Mining.

[5]  Victor S. Sheng,et al.  Multi-label graph node classification with label attentive neighborhood convolution , 2021, Expert Syst. Appl..

[6]  Manik Varma,et al.  DeepXML: A Deep Extreme Multi-Label Learning Framework Applied to Short Text Documents , 2021, WSDM.

[7]  Zenglin Xu,et al.  A Survey on Deep Semi-Supervised Learning , 2021, IEEE Transactions on Knowledge and Data Engineering.

[8]  Qian Huang,et al.  Combining Label Propagation and Simple Models Out-performs Graph Neural Networks , 2020, ICLR.

[9]  Yu Sun,et al.  Masked Label Prediction: Unified Massage Passing Model for Semi-Supervised Classification , 2020, IJCAI.

[10]  Mansoor Zolghadri Jahromi,et al.  A generalized weighted distance k-Nearest Neighbor for multi-label problems , 2020, Pattern Recognit..

[11]  Kup-Sze Choi,et al.  Network Together: Node Classification via Cross-Network Deep Network Embedding , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Xindong Wu,et al.  Multi-Label Truth Inference for Crowdsourcing Using Mixture Models , 2019, IEEE Transactions on Knowledge and Data Engineering.

[13]  ANDREA ROSSI,et al.  Knowledge Graph Embedding for Link Prediction: A Comparative Analysis , 2021, ACM Trans. Knowl. Discov. Data.

[14]  Irwin King,et al.  FeatureNorm: L2 Feature Normalization for Dynamic Graph Embedding , 2020, 2020 IEEE International Conference on Data Mining (ICDM).

[15]  Wei Wang,et al.  MARU: Meta-context Aware Random Walks for Heterogeneous Network Representation Learning , 2020, CIKM.

[16]  Yang Li,et al.  Network Embedding for Community Detection in Attributed Networks , 2020, ACM Trans. Knowl. Discov. Data.

[17]  Paolo Merialdo,et al.  Knowledge Graph Embedding for Link Prediction , 2020, ACM Transactions on Knowledge Discovery from Data.

[18]  Yufei Tang,et al.  MLNE: Multi-Label Network Embedding , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[19]  Wei Wang,et al.  Semi-Supervised Partial Label Learning via Confidence-Rated Margin Maximization , 2020, NeurIPS.

[20]  Jie Du,et al.  Robust Online Multilabel Learning Under Dynamic Changes in Data Distribution With Labels , 2020, IEEE Transactions on Cybernetics.

[21]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[22]  Collaborative Graph Walk for Semi-Supervised Multi-label Node Classification , 2019, 2019 IEEE International Conference on Data Mining (ICDM).

[23]  Jing Zhang,et al.  Semi-Supervised Graph Embedding for Multi-Label Graph Node Classification , 2019, WISE.

[24]  Jian Tang,et al.  vGraph: A Generative Model for Joint Community Detection and Node Representation Learning , 2019, NeurIPS.

[25]  Jian Pei,et al.  A Survey on Network Embedding , 2017, IEEE Transactions on Knowledge and Data Engineering.

[26]  Jun Wang,et al.  Semi-Supervised Multi-Label Feature Selection by Preserving Feature-Label Space Consistency , 2018, CIKM.

[27]  Sheng-Jun Huang,et al.  Partial Multi-Label Learning , 2018, AAAI.

[28]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[29]  Zhi-Hua Zhou,et al.  Multi-Label Learning with Global and Local Label Correlation , 2017, IEEE Transactions on Knowledge and Data Engineering.

[30]  Kevin Chen-Chuan Chang,et al.  Learning Community Embedding with Community Detection and Node Embedding on Graphs , 2017, CIKM.

[31]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[32]  Yu-Chiang Frank Wang,et al.  Learning Deep Latent Spaces for Multi-Label Classification , 2017, ArXiv.

[33]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[34]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[35]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[36]  Yunming Ye,et al.  Semi-supervised multi-label collective classification ensemble for functional genomics , 2014, BMC Genomics.

[37]  Min-Ling Zhang,et al.  A Review on Multi-Label Learning Algorithms , 2014, IEEE Transactions on Knowledge and Data Engineering.

[38]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[39]  Daan Wierstra,et al.  Stochastic Backpropagation and Approximate Inference in Deep Generative Models , 2014, ICML.

[40]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[41]  Gita Reese Sukthankar,et al.  Multi-label relational neighbor classification using social context features , 2013, KDD.

[42]  Chong Wang,et al.  Stochastic variational inference , 2012, J. Mach. Learn. Res..

[43]  José Ramón Quevedo,et al.  Multilabel classifiers with a probabilistic thresholding strategy , 2012, Pattern Recognit..

[44]  Geoff Holmes,et al.  Classifier chains for multi-label classification , 2009, Machine Learning.

[45]  Huan Liu,et al.  Scalable learning of collective behavior based on sparse social dimensions , 2009, CIKM.

[46]  Huan Liu,et al.  Relational learning via latent social dimensions , 2009, KDD.

[47]  Shih-Fu Chang,et al.  Graph construction and b-matching for semi-supervised learning , 2009, ICML '09.

[48]  K. Dolinski,et al.  The BioGRID Interaction Database: 2008 update , 2007, Nucleic Acids Res..

[49]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[50]  Zhi-Hua Zhou,et al.  ML-KNN: A lazy learning approach to multi-label learning , 2007, Pattern Recognit..

[51]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[52]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[53]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.