Bag of Tricks for Training Deeper Graph Neural Networks: A Comprehensive Benchmark Study

Training deep graph neural networks (GNNs) is notoriously hard. Besides the standard plights in training deep architectures such as vanishing gradients and overfitting, the training of deep GNNs also uniquely suffers from over-smoothing, information squashing, and so on, which limits their potential power on large-scale graphs. Although numerous efforts are proposed to address these limitations, such as various forms of skip connections, graph normalization, and random dropping, it is difficult to disentangle the advantages brought by a deep GNN architecture from those “tricks" necessary to train such an architecture. Moreover, the lack of a standardized benchmark with fair and consistent experimental settings poses an almost insurmountable obstacle to gauging the effectiveness of new mechanisms. In view of those, we present the first fair and reproducible benchmark dedicated to assessing the “tricks" of training deep GNNs. We categorize existing approaches, investigate their hyperparameter sensitivity, and unify the basic configuration. Comprehensive evaluations are then conducted on tens of representative graph datasets including the recent large-scale Open Graph Benchmark (OGB), with diverse deep GNN backbones. Based on synergistic studies, we discover the combo of superior training tricks, that lead us to attain the new state-of-the-art results for deep GCNs, across multiple representative graph datasets. We demonstrate that an organic combo of initial connection, identity mapping, group and batch normalization has the most ideal performance on large datasets. Experiments also reveal a number of “surprises" when combining or scaling up some of the tricks. All codes are available at https://github.com/VITA-Group/Deep_ GCN_Benchmarking.

[1]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[2]  Vinayak A. Rao,et al.  Relational Pooling for Graph Representations , 2019, ICML.

[3]  Ruslan Salakhutdinov,et al.  Revisiting Semi-Supervised Learning with Graph Embeddings , 2016, ICML.

[4]  Brian M. Sadler,et al.  VGAI: End-to-End Learning of Vision-Based Decentralized Controllers for Robot Swarms , 2021, ICASSP 2021 - 2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Mostafa Karimi,et al.  Explainable Deep Relational Networks for Predicting Compound-Protein Affinities and Contacts , 2019, bioRxiv.

[6]  Xiao-Ming Wu,et al.  Deeper Insights into Graph Convolutional Networks for Semi-Supervised Learning , 2018, AAAI.

[7]  Ken-ichi Kawarabayashi,et al.  Representation Learning on Graphs with Jumping Knowledge Networks , 2018, ICML.

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Stefanie Jegelka,et al.  Optimization of Graph Neural Networks: Implicit Acceleration by Skip Connections and More Depth , 2021, ICML.

[10]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[11]  Martin Grohe,et al.  Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks , 2018, AAAI.

[12]  Joan Bruna,et al.  On the equivalence between graph isomorphism testing and function approximation with GNNs , 2019, NeurIPS.

[13]  Shuiwang Ji,et al.  Towards Deeper Graph Neural Networks , 2020, KDD.

[14]  Bryan Hooi,et al.  Understanding and Resolving Performance Degradation in Deep Graph Convolutional Networks , 2020, CIKM.

[15]  Nikos Komodakis,et al.  Dynamic Edge-Conditioned Filters in Convolutional Neural Networks on Graphs , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Jie Zhou,et al.  Measuring and Relieving the Over-smoothing Problem for Graph Neural Networks from the Topological View , 2020, AAAI.

[17]  George Karypis,et al.  Comparison of descriptor spaces for chemical compound retrieval and classification , 2006, Sixth International Conference on Data Mining (ICDM'06).

[18]  Leman Akoglu,et al.  PairNorm: Tackling Oversmoothing in GNNs , 2020, ICLR.

[19]  Zhangyang Wang,et al.  Graph Contrastive Learning with Augmentations , 2020, NeurIPS.

[20]  Xavier Bresson,et al.  Benchmarking Graph Neural Networks , 2020, ArXiv.

[21]  Bernard Ghanem,et al.  DeepGCNs: Can GCNs Go As Deep As CNNs? , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[22]  Razvan Pascanu,et al.  Interaction Networks for Learning about Objects, Relations and Physics , 2016, NIPS.

[23]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[24]  Xiaoning Qian,et al.  Bayesian Graph Neural Networks with Adaptive Connection Sampling , 2020, ICML.

[25]  Xiao Huang,et al.  Towards Deeper Graph Neural Networks with Differentiable Group Normalization , 2020, NeurIPS.

[26]  Kaiming He,et al.  Group Normalization , 2018, ECCV.

[27]  Bernard Ghanem,et al.  FLAG: Adversarial Data Augmentation for Graph Neural Networks , 2020, ArXiv.

[28]  Eran Yahav,et al.  On the Bottleneck of Graph Neural Networks and its Practical Implications , 2021, ICLR.

[29]  Kevin Chen-Chuan Chang,et al.  Geom-GCN: Geometric Graph Convolutional Networks , 2020, ICLR.

[30]  Pablo Barceló,et al.  Logical Expressiveness of Graph Neural Networks , 2019 .

[31]  Huan Liu,et al.  Relational learning via latent social dimensions , 2009, KDD.

[32]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[33]  Kilian Q. Weinberger,et al.  Simplifying Graph Convolutional Networks , 2019, ICML.

[34]  Tianlong Chen,et al.  When Does Self-Supervision Help Graph Convolutional Networks? , 2020, ICML.

[35]  Taiji Suzuki,et al.  Graph Neural Networks Exponentially Lose Expressive Power for Node Classification , 2019, ICLR.

[36]  Sanjay Joshua Swamidass,et al.  Deep learning long-range information in undirected graphs with wave networks , 2018, 2019 International Joint Conference on Neural Networks (IJCNN).

[37]  Xavier Bresson,et al.  Geometric Matrix Completion with Recurrent Multi-Graph Neural Networks , 2017, NIPS.

[38]  Joelle Pineau,et al.  Improving Reproducibility in Machine Learning Research (A Report from the NeurIPS 2019 Reproducibility Program) , 2020, J. Mach. Learn. Res..

[39]  Stefanos Zafeiriou,et al.  Geometrically Principled Connections in Graph Neural Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Stephan Günnemann,et al.  Predict then Propagate: Graph Neural Networks meet Personalized PageRank , 2018, ICLR.

[41]  V. Prasanna,et al.  Deep Graph Neural Networks with Shallow Subgraph Samplers , 2020, ArXiv.

[42]  Ah Chung Tsoi,et al.  The Graph Neural Network Model , 2009, IEEE Transactions on Neural Networks.

[43]  Tianlong Chen,et al.  L2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[45]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[46]  Bernard Ghanem,et al.  DeepGCNs: Making GCNs Go as Deep as CNNs , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[48]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[49]  Jan Eric Lenssen,et al.  Fast Graph Representation Learning with PyTorch Geometric , 2019, ArXiv.

[50]  Yixin Chen,et al.  Link Prediction Based on Graph Neural Networks , 2018, NeurIPS.

[51]  Shuochao Yao,et al.  Revisiting "Over-smoothing" in Deep GCNs , 2020, ArXiv.

[52]  Yuanqing Xia,et al.  Revisiting Graph Convolutional Network on Semi-Supervised Node Classification from an Optimization Perspective , 2020, ArXiv.

[53]  Bernard Ghanem,et al.  DeeperGCN: All You Need to Train Deeper GCNs , 2020, ArXiv.

[54]  Yaliang Li,et al.  Simple and Deep Graph Convolutional Networks , 2020, ICML.

[55]  Doina Precup,et al.  Break the Ceiling: Stronger Multi-scale Deep Graph Convolutional Networks , 2019, NeurIPS.

[56]  Olgica Milenkovic,et al.  Adaptive Universal Generalized PageRank Graph Neural Network , 2021, ICLR.

[57]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[58]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[59]  Zhangyang Wang,et al.  Graph Contrastive Learning Automated , 2021, ICML.

[60]  Zhangyang Wang,et al.  A Unified Lottery Ticket Hypothesis for Graph Neural Networks , 2021, ICML.

[61]  Zhengyang Wang,et al.  Large-Scale Learnable Graph Convolutional Networks , 2018, KDD.

[62]  Takanori Maehara,et al.  Revisiting Graph Neural Networks: All We Have is Low-Pass Filters , 2019, ArXiv.

[63]  Jure Leskovec,et al.  Hierarchical Graph Representation Learning with Differentiable Pooling , 2018, NeurIPS.

[64]  Qingquan Song,et al.  Graph Recurrent Networks With Attributed Random Walks , 2019, KDD.

[65]  Yu Sun,et al.  Masked Label Prediction: Unified Massage Passing Model for Semi-Supervised Classification , 2020, IJCAI.

[66]  J. Leskovec,et al.  Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[67]  Tingyang Xu,et al.  DropEdge: Towards Deep Graph Convolutional Networks on Node Classification , 2020, ICLR.

[68]  Yoshua Bengio,et al.  GraphMix: Regularized Training of Graph Neural Networks for Semi-Supervised Learning , 2019, ArXiv.

[69]  Yoshua Bengio,et al.  GMNN: Graph Markov Neural Networks , 2019, ICML.

[70]  Jimeng Sun,et al.  Social influence analysis in large-scale networks , 2009, KDD.

[71]  Jure Leskovec,et al.  Predicting multicellular function through multi-layer tissue networks , 2017, Bioinform..

[72]  Zhiyuan Liu,et al.  Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.

[73]  Weiwei Sun,et al.  Attentive Context Normalization for Robust Permutation-Equivariant Learning , 2019, ArXiv.

[74]  Stephan Günnemann,et al.  Pitfalls of Graph Neural Network Evaluation , 2018, ArXiv.

[75]  Jure Leskovec,et al.  Graph Convolutional Neural Networks for Web-Scale Recommender Systems , 2018, KDD.

[76]  Joan Bruna,et al.  Spectral Networks and Locally Connected Networks on Graphs , 2013, ICLR.

[77]  Guy Wolf,et al.  Scattering GCN: Overcoming Oversmoothness in Graph Convolutional Networks , 2020, NeurIPS.