Active Learning for Node Classification: The Additional Learning Ability from Unlabelled Nodes

Node classification on graph data is an important task on many practical domains. However, it requires labels for training, which can be difficult or expensive to obtain in practice. Given a limited labelling budget, active learning aims to improve performance by carefully choosing which nodes to label. Our empirical study shows that existing active learning methods for node classification are considerably outperformed by a simple method which randomly selects nodes to label and trains a linear classifier with labelled nodes and unsupervised learning features. This indicates that existing methods do not fully utilize the information present in unlabelled nodes as they only use unlabelled nodes for label acquisition. In this paper, we utilize the information in unlabelled nodes by using unsupervised learning features. We propose a novel latent space clustering-based active learning method for node classification (LSCALE). Specifically, to select nodes for labelling, our method uses the K-Medoids clustering algorithm on a feature space based on the dynamic combination of both unsupervised features and supervised features. In addition, we design an incremental clustering module to avoid redundancy between nodes selected at different steps. We conduct extensive experiments on three public citation datasets and two co-authorship datasets, where our proposed method LSCALE consistently and significantly outperforms the state-of-the-art approaches by a large margin.

[1]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[2]  Cao Xiao,et al.  FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling , 2018, ICLR.

[3]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[4]  Ruslan Salakhutdinov,et al.  Revisiting Semi-Supervised Learning with Graph Embeddings , 2016, ICML.

[5]  Yiming Yang,et al.  Active Learning for Graph Neural Networks via Node Feature Propagation , 2019, ArXiv.

[6]  Rong Jin,et al.  Batch mode active learning and its application to medical image classification , 2006, ICML.

[7]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[8]  Pietro Liò,et al.  Deep Graph Infomax , 2018, ICLR.

[9]  Shai Shalev-Shwartz,et al.  Discriminative Active Learning , 2019, ArXiv.

[10]  Kilian Q. Weinberger,et al.  Simplifying Graph Convolutional Networks , 2019, ICML.

[11]  Rong Jin,et al.  Semisupervised SVM batch mode active learning with applications to image retrieval , 2009, TOIS.

[12]  Trevor Darrell,et al.  Variational Adversarial Active Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[13]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[14]  Kamal Nigamyknigam,et al.  Employing Em in Pool-based Active Learning for Text Classiication , 1998 .

[15]  David D. Lewis,et al.  Heterogeneous Uncertainty Sampling for Supervised Learning , 1994, ICML.

[16]  Jong Chul Ye,et al.  Understanding Graph Isomorphism Network for Brain MR Functional Connectivity Analysis , 2020, ArXiv.

[17]  Ben Glocker,et al.  Disease prediction using graph convolutional networks: Application to Autism Spectrum Disorder and Alzheimer's disease , 2018, Medical Image Anal..

[18]  Kevin Chen-Chuan Chang,et al.  Active Learning for Graph Embedding , 2017, ArXiv.

[19]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[20]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21]  Stephan Günnemann,et al.  Pitfalls of Graph Neural Network Evaluation , 2018, ArXiv.

[22]  Mingzhe Wang,et al.  LINE: Large-scale Information Network Embedding , 2015, WWW.

[23]  Lise Getoor,et al.  Query-driven Active Surveying for Collective Classification , 2012 .

[24]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[25]  Jong Chul Ye,et al.  Understanding Graph Isomorphism Network for rs-fMRI Functional Connectivity Analysis , 2020, Frontiers in Neuroscience.

[26]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[27]  H. Sebastian Seung,et al.  Query by committee , 1992, COLT '92.

[28]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[29]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[30]  Marco Loog,et al.  A benchmark and comparison of active learning for logistic regression , 2016, Pattern Recognit..

[31]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[32]  Sourav S. Bhowmick,et al.  Scaling attributed network embedding to massive graphs , 2020, Proc. VLDB Endow..

[33]  Hong Yang,et al.  Active Discriminative Network Representation Learning , 2018, IJCAI.

[34]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[35]  Silvio Savarese,et al.  Active Learning for Convolutional Neural Networks: A Core-Set Approach , 2017, ICLR.