SEAL: Semisupervised Adversarial Active Learning on Attributed Graphs

Active learning (AL) on attributed graphs has received increasing attention with the prevalence of graph-structured data. Although AL has been widely studied for alleviating label sparsity issues with the conventional nonrelational data, how to make it effective over attributed graphs remains an open research question. Existing AL algorithms on node classification attempt to reuse the classic AL query strategies designed for nonrelational data. However, they suffer from two major limitations. First, different AL query strategies calculated in distinct scoring spaces are often naively combined to determine which nodes to be labeled. Second, the AL query engine and the learning of the classifier are treated as two separating processes, resulting in unsatisfactory performance. In this article, we propose a SEmisupervised Adversarial active Learning (SEAL) framework on attributed graphs, which fully leverages the representation power of deep neural networks and devises a novel AL query strategy for node classification in an adversarial way. Our framework learns two adversarial components; a graph embedding network that encodes both the unlabeled and labeled nodes into a common latent space, expecting to trick the discriminator to regard all nodes as already labeled, and a semisupervised discriminator network that distinguishes the unlabeled from the existing labeled nodes. The divergence score, generated by the discriminator in a unified latent space, serves as the informativeness measure to actively select the most informative node to be labeled by an oracle. The two adversarial components form a closed loop to mutually and simultaneously reinforce each other toward enhancing the AL performance. Extensive experiments on real-world networks validate the effectiveness of the SEAL framework with superior performance improvements to state-of-the-art baselines on node classification tasks.

[1]  Andrew McCallum,et al.  Toward Optimal Active Learning through Monte Carlo Estimation of Error Reduction , 2001, ICML 2001.

[2]  Chengqi Zhang,et al.  Attributed network embedding via subspace discovery , 2019, Data Mining and Knowledge Discovery.

[3]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[4]  Ming-Wei Chang,et al.  Learning and Inference with Constraints , 2008, AAAI.

[5]  Bin Li,et al.  A survey on instance selection for active learning , 2012, Knowledge and Information Systems.

[6]  Mark Craven,et al.  An Analysis of Active Learning Strategies for Sequence Labeling Tasks , 2008, EMNLP.

[7]  En Zhu,et al.  A Scalable Algorithm for Graph-Based Active Learning , 2008, FAW.

[8]  Mark Craven,et al.  Multiple-Instance Active Learning , 2007, NIPS.

[9]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[10]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[11]  Jiawei Han,et al.  Towards Active Learning on Graphs: An Error Bound Minimization Approach , 2012, 2012 IEEE 12th International Conference on Data Mining.

[12]  Trevor Darrell,et al.  Variational Adversarial Active Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Hongxia Jin,et al.  Adversarial Active Learning for Sequences Labeling and Generation , 2018, IJCAI.

[14]  Jianping Yin,et al.  Graph-Based Active Learning Based on Label Propagation , 2008, MDAI.

[15]  Georgios B. Giannakis,et al.  Data-Adaptive Active Sampling for Efficient Graph-Cognizant Classification , 2017, IEEE Transactions on Signal Processing.

[16]  Le Wu,et al.  Deep Attributed Network Embedding by Preserving Structure and Attribute Information , 2021, IEEE Transactions on Systems, Man, and Cybernetics: Systems.

[17]  Kevin Chen-Chuan Chang,et al.  Active Learning for Graph Embedding , 2017, ArXiv.

[18]  Chengqi Zhang,et al.  Active Class Discovery and Learning for Networked Data , 2013, SDM.

[19]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[20]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[21]  Jie Tang,et al.  Batch Mode Active Learning for Networked Data , 2012, TIST.

[22]  Roman Garnett,et al.  Σ-Optimality for Active Learning on Gaussian Random Fields , 2013, NIPS.

[23]  Shiliang Sun,et al.  Active learning with extremely sparse labeled examples , 2010, Neurocomputing.

[24]  William Speier,et al.  Semi-supervised learning based on generative adversarial network: a comparison between good GAN and bad GAN approach , 2019, CVPR Workshops.

[25]  Ralf Klinkenberg,et al.  Data Classification: Algorithms and Applications , 2014 .

[26]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[27]  David J. C. MacKay,et al.  Information-Based Objective Functions for Active Data Selection , 1992, Neural Computation.

[28]  Hong Yang,et al.  Active Discriminative Network Representation Learning , 2018, IJCAI.

[29]  Fan Yang,et al.  Good Semi-supervised Learning That Requires a Bad GAN , 2017, NIPS.

[30]  Jeff A. Bilmes,et al.  Active Semi-Supervised Learning using Submodular Functions , 2011, UAI.

[31]  Georgios B. Giannakis,et al.  Active sampling for graph-aware classification , 2017, 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP).

[32]  Jie Tang,et al.  Combining link and content for collective active learning , 2010, CIKM.

[33]  Jeff A. Bilmes,et al.  Label Selection on Graphs , 2009, NIPS.

[34]  L. Getoor,et al.  Link-Based Classification , 2003, Encyclopedia of Machine Learning and Data Mining.

[35]  Nikolaos Papanikolopoulos,et al.  Scalable Active Learning for Multiclass Image Classification , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Oriol Vinyals,et al.  Towards Principled Unsupervised Learning , 2015, ArXiv.

[37]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[38]  Kentaro Inui,et al.  Selective Sampling for Example-based Word Sense Disambiguation , 1998, CL.

[39]  Lise Getoor,et al.  Active Learning for Networked Data , 2010, ICML.

[40]  J. Lafferty,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[41]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[42]  Jiawei Han,et al.  A Variance Minimization Criterion to Active Learning on Graphs , 2012, AISTATS.

[43]  William A. Gale,et al.  A sequential algorithm for training text classifiers , 1994, SIGIR '94.

[44]  Andrew McCallum,et al.  Employing EM and Pool-Based Active Learning for Text Classification , 1998, ICML.

[45]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[46]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .