Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive Benchmark Study

Training deep graph neural networks (GNNs) is notoriously hard. Besides the standard plights in training deep architectures such as vanishing gradients and overfitting, it also uniquely suffers from over-smoothing, information squashing, and so on, which limits their potential power for encoding the high-order neighbor structure in large-scale graphs. Although numerous efforts are proposed to address these limitations, such as various forms of skip connections, graph normalization, and random dropping, it is difficult to disentangle the advantages brought by a deep GNN architecture from those “tricks” necessary to train such an architecture. Moreover, the lack of a standardized benchmark with fair and consistent experimental settings poses an almost insurmountable obstacle to gauge the effectiveness of new mechanisms. In view of those, we present the first fair and reproducible benchmark dedicated to assessing the “tricks” of training deep GNNs. We categorize existing approaches, investigate their hyperparameter sensitivity, and unify the basic configuration. Comprehensive evaluations are then conducted on tens of representative graph datasets including the recent large-scale Open Graph Benchmark, with diverse deep GNN backbones. We demonstrate that an organic combo of initial connection, identity mapping, group and batch normalization attains the new state-of-the-art results for deep GNNs on large datasets. Codes are available: https://github.com/VITA-Group/Deep_GCN_Benchmarking.

[1]  Zhangyang Wang,et al.  Anti-Oversmoothing in Deep Vision Transformers via the Fourier Domain Analysis: From Theory to Practice , 2022, ICLR.

[2]  Yang Shen,et al.  Bringing Your Own View: Graph Contrastive Learning without Prefabricated Data Augmentations , 2022, WSDM.

[3]  Edward W. Huang,et al.  Cold Brew: Distilling Graph Node Representations with Incomplete or Missing Neighborhoods , 2021, ICLR.

[4]  Xia Hu,et al.  Orthogonal Graph Neural Networks , 2021, AAAI.

[5]  Yikuan Xia,et al.  Evaluating Deep Graph Neural Networks , 2021, ArXiv.

[6]  Xia Hu,et al.  Dirichlet Energy Constrained Learning for Deep Graph Neural Networks , 2021, NeurIPS.

[7]  Brian M. Sadler,et al.  Scalable Perception-Action-Communication Loops With Convolutional and Graph Neural Networks , 2021, IEEE Transactions on Signal and Information Processing over Networks.

[8]  V. Koltun,et al.  Training Graph Neural Networks with 1000 Layers , 2021, ICML.

[9]  Zhangyang Wang,et al.  Graph Contrastive Learning Automated , 2021, ICML.

[10]  Stefanie Jegelka,et al.  Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth , 2021, ICML.

[11]  Yong Yu,et al.  Bag of Tricks for Node Classification with Graph Neural Networks , 2021, 2103.13355.

[12]  Zhangyang Wang,et al.  A Unified Lottery Ticket Hypothesis for Graph Neural Networks , 2021, ICML.

[13]  V. Prasanna,et al.  Deep Graph Neural Networks with Shallow Subgraph Samplers , 2020, ArXiv.

[14]  Zhangyang Wang,et al.  Graph Contrastive Learning with Augmentations , 2020, NeurIPS.

[15]  Bernard Ghanem,et al.  FLAG: Adversarial Data Augmentation for Graph Neural Networks , 2020, ArXiv.

[16]  Yuanqing Xia,et al.  Revisiting Graph Convolutional Network on Semi-Supervised Node Classification from an Optimization Perspective , 2020, ArXiv.

[17]  Yu Sun,et al.  Masked Label Prediction: Unified Massage Passing Model for Semi-Supervised Classification , 2020, IJCAI.

[18]  Junzhou Huang,et al.  Tackling Over-Smoothing for General Graph Convolutional Networks , 2020, ArXiv.

[19]  Shuiwang Ji,et al.  Towards Deeper Graph Neural Networks , 2020, KDD.

[20]  Yaliang Li,et al.  Simple and Deep Graph Convolutional Networks , 2020, ICML.

[21]  Tianlong Chen,et al.  When Does Self-Supervision Help Graph Convolutional Networks? , 2020, ICML.

[22]  Olgica Milenkovic,et al.  Adaptive Universal Generalized PageRank Graph Neural Network , 2020, ICLR.

[23]  Bernard Ghanem,et al.  DeeperGCN: All You Need to Train Deeper GCNs , 2020, ArXiv.

[24]  Bryan Hooi,et al.  Understanding and Resolving Performance Degradation in Deep Graph Convolutional Networks , 2020, CIKM.

[25]  Xiao Huang,et al.  Towards Deeper Graph Neural Networks with Differentiable Group Normalization , 2020, NeurIPS.

[26]  Eran Yahav,et al.  On the Bottleneck of Graph Neural Networks and its Practical Implications , 2020, ICLR.

[27]  Xiaoning Qian,et al.  Bayesian Graph Neural Networks with Adaptive Connection Sampling , 2020, ICML.

[28]  Tianlong Chen,et al.  L2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  J. Leskovec,et al.  Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[30]  Pablo Barceló,et al.  Logical Expressiveness of Graph Neural Networks , 2019 .

[31]  Stefanos Zafeiriou,et al.  Geometrically Principled Connections in Graph Neural Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  T. Abdelzaher,et al.  Revisiting "Over-smoothing" in Deep GCNs , 2020, ArXiv.

[33]  Joelle Pineau,et al.  Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program) , 2020, J. Mach. Learn. Res..

[34]  Guy Wolf,et al.  Scattering GCN: Overcoming Oversmoothness in Graph Convolutional Networks , 2020, NeurIPS.

[35]  Kevin Chen-Chuan Chang,et al.  Geom-GCN: Geometric Graph Convolutional Networks , 2020, ICLR.

[36]  Brian M. Sadler,et al.  VGAI: End-to-End Learning of Vision-Based Decentralized Controllers for Robot Swarms , 2020, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[37]  Mostafa Karimi,et al.  Explainable Deep Relational Networks for Predicting Compound-Protein Affinities and Contacts , 2019, bioRxiv.

[38]  Yizhou Sun,et al.  Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks , 2019, NeurIPS.

[39]  Bernard Ghanem,et al.  DeepGCNs: Making GCNs Go as Deep as CNNs , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40]  L. Akoglu,et al.  PairNorm: Tackling Oversmoothing in GNNs , 2019, ICLR.

[41]  Yoshua Bengio,et al.  GraphMix: Regularized Training of Graph Neural Networks for Semi-Supervised Learning , 2019, ArXiv.

[42]  Xiao Huang,et al.  Auto-GNN: Neural architecture search of graph neural networks , 2019, Frontiers in Big Data.

[43]  Xu Sun,et al.  Measuring and Relieving the Over-smoothing Problem for Graph Neural Networks from the Topological View , 2019, AAAI.

[44]  Junzhou Huang,et al.  DropEdge: Towards Deep Graph Convolutional Networks on Node Classification , 2019, ICLR.

[45]  Qingquan Song,et al.  Graph Recurrent Networks With Attributed Random Walks , 2019, KDD.

[46]  Weiwei Sun,et al.  Attentive Context Normalization for Robust Permutation-Equivariant Learning , 2019, ArXiv.

[47]  Doina Precup,et al.  Break the Ceiling: Stronger Multi-scale Deep Graph Convolutional Networks , 2019, NeurIPS.

[48]  Joan Bruna,et al.  On the equivalence between graph isomorphism testing and function approximation with GNNs , 2019, NeurIPS.

[49]  Taiji Suzuki,et al.  Graph Neural Networks Exponentially Lose Expressive Power for Node Classification , 2019, ICLR.

[50]  Takanori Maehara,et al.  Revisiting Graph Neural Networks: All We Have is Low-Pass Filters , 2019, ArXiv.

[51]  Yoshua Bengio,et al.  GMNN: Graph Markov Neural Networks , 2019, ICML.

[52]  Bernard Ghanem,et al.  DeepGCNs: Can GCNs Go As Deep As CNNs? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[53]  Jan Eric Lenssen,et al.  Fast Graph Representation Learning with PyTorch Geometric , 2019, ArXiv.

[54]  Vinayak A. Rao,et al.  Relational Pooling for Graph Representations , 2019, ICML.

[55]  Kilian Q. Weinberger,et al.  Simplifying Graph Convolutional Networks , 2019, ICML.

[56]  Zhiyuan Liu,et al.  Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.

[57]  Stephan Günnemann,et al.  Pitfalls of Graph Neural Network Evaluation , 2018, ArXiv.

[58]  Sanjay Joshua Swamidass,et al.  Deep learning long-range information in undirected graphs with wave networks , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).

[59]  Martin Grohe,et al.  Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks , 2018, AAAI.

[60]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[61]  Stephan Günnemann,et al.  Predict then Propagate: Graph Neural Networks meet Personalized PageRank , 2018, ICLR.

[62]  Zhengyang Wang,et al.  Large-Scale Learnable Graph Convolutional Networks , 2018, KDD.

[63]  Jure Leskovec,et al.  Hierarchical Graph Representation Learning with Differentiable Pooling , 2018, NeurIPS.

[64]  Ken-ichi Kawarabayashi,et al.  Representation Learning on Graphs with Jumping Knowledge Networks , 2018, ICML.

[65]  Jure Leskovec,et al.  Graph Convolutional Neural Networks for Web-Scale Recommender Systems , 2018, KDD.

[66]  Kaiming He,et al.  Group Normalization , 2018, International Journal of Computer Vision.

[67]  Yixin Chen,et al.  Link Prediction Based on Graph Neural Networks , 2018, NeurIPS.

[68]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[69]  Xiao-Ming Wu,et al.  Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning , 2018, AAAI.

[70]  Pierre Vandergheynst,et al.  Graph Signal Processing: Overview, Challenges, and Applications , 2017, Proceedings of the IEEE.

[71]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[72]  Jure Leskovec,et al.  Predicting multicellular function through multi-layer tissue networks , 2017, Bioinform..

[73]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[74]  Nikos Komodakis,et al.  Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[75]  Xavier Bresson,et al.  Geometric Matrix Completion with Recurrent Multi-Graph Neural Networks , 2017, NIPS.

[76]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[77]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[78]  Geoffrey E. Hinton,et al.  Layer Normalization , 2016, ArXiv.

[79]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[80]  Jianan Zhao,et al.  Large Data Throughput Optimization Model with Full C order model Parallel Flow Number Prediction Optical Domain , 2016 .

[81]  Ruslan Salakhutdinov,et al.  Revisiting Semi-Supervised Learning with Graph Embeddings , 2016, ICML.

[82]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[83]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[84]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[85]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[86]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[87]  Huan Liu,et al.  Relational learning via latent social dimensions , 2009, KDD.

[88]  Jimeng Sun,et al.  Social influence analysis in large-scale networks , 2009, KDD.

[89]  George Karypis,et al.  Comparison of descriptor spaces for chemical compound retrieval and classification , 2006, Sixth International Conference on Data Mining (ICDM'06).

[90]  Yoshua Bengio,et al.  Benchmarking Graph Neural Networks , 2023, J. Mach. Learn. Res..

[91]  Jingrui He,et al.  Tackling Oversmoothing of GNNs with Contrastive Learning , 2021, ArXiv.

[92]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[93]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.