Confidence May Cheat: Self-Training on Graph Neural Networks under Distribution Shift

Graph Convolutional Networks (GCNs) have recently attracted vast interest and achieved state-of-the-art performance on graphs, but its success could typically hinge on careful trainingwith amounts of expensive and time-consuming labeled data. To alleviate labeled data scarcity, self-training methods have been widely adopted on graphs by labeling high-confidence unlabeled nodes and then adding them to the training step. In this line, we empirically make a thorough study for current self-training methods on graphs. Surprisingly, we find that high-confidence unlabeled nodes are not always useful, and even introduce the distribution shift issue between the original labeled dataset and the augmented dataset by self-training, severely hindering the capability of self-training on graphs. To this end, in this paper, we propose a novel Distribution Recovered Graph Self-Training framework (DR-GST), which could recover the distribution of the original labeled dataset. Specifically, we first prove the equality of loss function in self-training framework under the distribution shift case and the population distribution if each pseudo-labeled node is weighted by a proper coefficient. Considering the intractability of the coefficient, we then propose to replace the coefficient with the information gain after observing the same changing trend between them, where information gain is respectively estimated via both dropout variational inference and dropedge variational inference in DR-GST. However, such a weighted loss function will enlarge the impact of incorrect pseudo labels. As a result, we apply the loss correction method to improve the quality of pseudo labels. Both our theoretical analysis and extensive experiments on five benchmark datasets demonstrate the effectiveness of the proposed DR-GST, as well as each well-designed component in DR-GST. ∗Work done during internship at Ant Group. †Corresponding author WWW ’22, April 25–29, 2022, Virtual Event, Lyon, France © 2022 Association for Computing Machinery. This is the author’s version of the work. It is posted here for your personal use. Not for redistribution. The definitive Version of Record was published in Proceedings of the ACM Web Conference 2022 (WWW ’22), April 25–29, 2022, Virtual Event, Lyon, France, https://doi.org/10.1145/3485447.3512172. CCS CONCEPTS •Computingmethodologies→Neural networks; •Theory of computation → Social networks; Semi-supervised learning.

[1]  Xiaoning Qian,et al.  Bayesian Graph Neural Networks with Adaptive Connection Sampling , 2020, ICML.

[2]  H. J. Scudder,et al.  Probability of error of some adaptive pattern-recognition machines , 1965, IEEE Trans. Inf. Theory.

[3]  Stephan Günnemann,et al.  Predict then Propagate: Graph Neural Networks meet Personalized PageRank , 2018, ICLR.

[4]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[5]  Subhabrata Mukherjee,et al.  Uncertainty-aware Self-training for Few-shot Text Classification , 2020, NeurIPS.

[6]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[7]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[8]  Jacob Goldberger,et al.  Training deep neural-networks using a noise adaptation layer , 2016, ICLR.

[9]  Longin Jan Latecki,et al.  Rank-based self-training for graph convolutional networks , 2021, Inf. Process. Manag..

[10]  Tingyang Xu,et al.  DropEdge: Towards Deep Graph Convolutional Networks on Node Classification , 2020, ICLR.

[11]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[12]  Jiajun Shen,et al.  Revisiting Self-Training for Neural Sequence Generation , 2020, ICLR.

[13]  Zhanxing Zhu,et al.  Multi-Stage Self-Supervised Learning for Graph Convolutional Networks , 2020, AAAI.

[14]  Joan Bruna,et al.  Training Convolutional Networks with Noisy Labels , 2014, ICLR 2014.

[15]  Zhiyuan Liu,et al.  Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.

[16]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[17]  Dong-Hyun Lee,et al.  Pseudo-Label : The Simple and Efficient Semi-Supervised Learning Method for Deep Neural Networks , 2013 .

[18]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[19]  Xiangliang Zhang,et al.  Co-Embedding Attributed Networks , 2019, WSDM.

[20]  Michael J. Watts,et al.  IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS Publication Information , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[21]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[22]  Zoubin Ghahramani,et al.  Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning , 2015, ICML.

[23]  Stephan Günnemann,et al.  Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking , 2017, ICLR.

[24]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Xiao Wang,et al.  Be Confident! Towards Trustworthy Graph Neural Networks via Confidence Calibration , 2021, NeurIPS.

[26]  Alex Graves,et al.  Practical Variational Inference for Neural Networks , 2011, NIPS.

[27]  Xiao-Ming Wu,et al.  Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning , 2018, AAAI.

[28]  Julien Cornebise,et al.  Weight Uncertainty in Neural Network , 2015, ICML.

[29]  Daniel Hernández-Lobato,et al.  Black-Box Alpha Divergence Minimization , 2015, ICML.

[30]  Gang Niu,et al.  Confidence Scores Make Instance-dependent Label-noise Learning Possible , 2019, ICML.

[31]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[32]  Matthijs Douze,et al.  Deep Clustering for Unsupervised Learning of Visual Features , 2018, ECCV.

[33]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[34]  Kilian Q. Weinberger,et al.  Simplifying Graph Convolutional Networks , 2019, ICML.

[35]  C.-C. Jay Kuo,et al.  SLADE: A Self-Training Framework For Distance Metric Learning , 2020, ArXiv.

[36]  Wenwu Zhu,et al.  Deep Learning on Graphs: A Survey , 2018, IEEE Transactions on Knowledge and Data Engineering.