Adaptive Label Smoothing To Regularize Large-Scale Graph Training

Graph neural networks (GNNs), which learn the node representations by recursively aggregating information from its neighbors, have become a predominant computational tool in many domains. To handle large-scale graphs, most of the existing methods partition the input graph into multiple sub-graphs (e.g., through node clustering) and apply batch training to save memory cost. However, such batch training will lead to label bias within each batch, and then result in over-confidence in model predictions. Since the connected nodes with positively related labels tend to be assigned together, the traditional cross-entropy minimization process will attend on the predictions of biased classes in the batch, and may intensify the overfitting issue. To overcome the label bias problem, we propose the adaptive label smoothing (ALS) method to replace the one-hot hard labels with smoothed ones, which learns to allocate label confidences from the biased classes to the others. Specifically, ALS propagates node labels to aggregate the neighborhood label distribution in a pre-processing step, and then updates the optimal smoothed labels online to adapt to specific graph structure. Experiments on the real-world datasets demonstrate that ALS can be generally applied to the main scalable learning frameworks to calibrate the biased labels and improve generalization performances.

[1]  Samy Bengio,et al.  Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks , 2019, KDD.

[2]  Qi Tian,et al.  DisturbLabel: Regularizing CNN on the Loss Layer , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Meng Wang,et al.  Revisiting Graph based Collaborative Filtering: A Linear Residual Graph Convolutional Network Approach , 2020, AAAI.

[4]  Huan Liu,et al.  Exploiting homophily effect for trust prediction , 2013, WSDM.

[5]  Mikhail Belkin,et al.  Manifold Regularization: A Geometric Framework for Learning from Labeled and Unlabeled Examples , 2006, J. Mach. Learn. Res..

[6]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[7]  Shiguang Shan,et al.  Self-Paced Curriculum Learning , 2015, AAAI.

[8]  Rajgopal Kannan,et al.  GraphSAINT: Graph Sampling Based Inductive Learning Method , 2019, ICLR.

[9]  Shu-Tao Xia,et al.  Adaptive Regularization of Labels , 2019, ArXiv.

[10]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Steven Skiena,et al.  DeepWalk: online learning of social representations , 2014, KDD.

[12]  Jure Leskovec,et al.  Hierarchical Graph Representation Learning with Differentiable Pooling , 2018, NeurIPS.

[13]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[14]  Jure Leskovec,et al.  Unifying Graph Convolutional Neural Networks and Label Propagation , 2020, ArXiv.

[15]  Zhengyang Wang,et al.  Large-Scale Learnable Graph Convolutional Networks , 2018, KDD.

[16]  Geoffrey E. Hinton,et al.  When Does Label Smoothing Help? , 2019, NeurIPS.

[17]  Visar Berisha,et al.  Regularization via Structural Label Smoothing , 2020, AISTATS.

[18]  Tong Zhang,et al.  Learning on Graph with Laplacian Regularization , 2006, NIPS.

[19]  Luke S. Zettlemoyer,et al.  Learning Better Structured Representations Using Low-rank Adaptive Label Smoothing , 2021, ICLR.

[20]  Xinbing Wang,et al.  AceKG: A Large-scale Knowledge Graph for Academic Data Mining , 2018, CIKM.

[21]  Avery Ching,et al.  One Trillion Edges: Graph Processing at Facebook-Scale , 2015, Proc. VLDB Endow..

[22]  Shuming Shi,et al.  On the Inference Calibration of Neural Machine Translation , 2020, ACL.

[23]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[24]  Jun Zhu,et al.  Towards Robust Detection of Adversarial Examples , 2017, NeurIPS.

[25]  Xinhua Zhang,et al.  Hyperparameter Learning for Graph Based Semi-supervised Learning Algorithms , 2006, NIPS.

[26]  Xiao Huang,et al.  Temporal Augmented Graph Neural Networks for Session-Based Recommendations , 2021, SIGIR.

[27]  Eyke Hüllermeier,et al.  From Label Smoothing to Label Relaxation , 2021, AAAI.

[28]  Yaliang Li,et al.  Simple and Deep Graph Convolutional Networks , 2020, ICML.

[29]  Cao Xiao,et al.  FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling , 2018, ICLR.

[30]  Yu Sun,et al.  Masked Label Prediction: Unified Massage Passing Model for Semi-Supervised Classification , 2020, IJCAI.

[31]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[32]  Jure Leskovec,et al.  Graph Convolutional Neural Networks for Web-Scale Recommender Systems , 2018, KDD.

[33]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[34]  J. Leskovec,et al.  Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[35]  Guoshi Wu,et al.  Scalable and Adaptive Graph Neural Networks with Self-Label-Enhanced training , 2021, ArXiv.

[36]  Le Song,et al.  Stochastic Training of Graph Convolutional Networks with Variance Reduction , 2017, ICML.

[37]  Ryan Cotterell,et al.  Generalized Entropy Regularization or: There’s Nothing Special about Label Smoothing , 2020, ACL.

[38]  Yi Xu,et al.  Towards Understanding Label Smoothing , 2020, ArXiv.

[39]  Geoffrey E. Hinton,et al.  Regularizing Neural Networks by Penalizing Confident Output Distributions , 2017, ICLR.

[40]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[41]  Xia Hu,et al.  Dirichlet Energy Constrained Learning for Deep Graph Neural Networks , 2021, NeurIPS.

[42]  Qian Huang,et al.  Combining Label Propagation and Simple Models Out-performs Graph Neural Networks , 2020, ICLR.

[43]  Yuan He,et al.  Graph Neural Networks for Social Recommendation , 2019, WWW.

[44]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[45]  Xiaojin Zhu,et al.  --1 CONTENTS , 2006 .

[46]  Qingquan Song,et al.  Multi-Channel Graph Neural Networks , 2020, IJCAI.

[47]  Johannes Klicpera,et al.  Scaling Graph Neural Networks with Approximate PageRank , 2020, KDD.

[48]  Dik Lun Lee,et al.  Billion-scale Commodity Embedding for E-commerce Recommendation in Alibaba , 2018, KDD.

[49]  Xiao Huang,et al.  Auto-GNN: Neural architecture search of graph neural networks , 2019, Frontiers in Big Data.

[50]  M. Newman Coauthorship networks and patterns of scientific collaboration , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[51]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[52]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[53]  Morgane Goibert,et al.  Adversarial Robustness via Adversarial Label-Smoothing , 2019, ArXiv.

[54]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[55]  Xiao Huang,et al.  Towards Deeper Graph Neural Networks with Differentiable Group Normalization , 2020, NeurIPS.

[56]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[57]  Austin R. Benson,et al.  Residual Correlation in Graph Neural Network Regression , 2020, KDD.

[58]  Davide Eynard,et al.  SIGN: Scalable Inception Graph Neural Networks , 2020, ArXiv.

[59]  Yaxin Peng,et al.  Defending Against Adversarial Attacks by Suppressing the Largest Eigenvalue of Fisher Information Matrix , 2019, ArXiv.

[60]  Kilian Q. Weinberger,et al.  Simplifying Graph Convolutional Networks , 2019, ICML.

[61]  Jiashi Feng,et al.  Revisit Knowledge Distillation: a Teacher-free Framework , 2019, ArXiv.

[62]  Hermann Ney,et al.  Towards a Better Understanding of Label Smoothing in Neural Machine Translation , 2020, AACL.