Large-scale graph representation learning with very deep GNNs and self-supervision

Effectively and efficiently deploying graph neural networks (GNNs) at scale remains one of the most challenging aspects of graph representation learning. Many powerful solutions have only ever been validated on comparatively small datasets, often with counter-intuitive outcomes—a barrier which has been broken by the Open Graph Benchmark Large-Scale Challenge (OGB-LSC). We entered the OGB-LSC with two large-scale GNNs: a deep transductive node classifier powered by bootstrapping, and a very deep (up to 50-layer) inductive graph regressor regularised by denoising objectives. Our models achieved an award-level (top-3) performance on both the MAG240M and PCQM4M benchmarks. In doing so, we demonstrate evidence of scalable self-supervised graph representation learning, and utility of very deep GNNs—both very important open issues. Our code is publicly available at: https://github.com/deepmind/deepmind-research/tree/master/ogb_lsc.

[1]  Jessica B. Hamrick,et al.  Relational inductive bias for physical construction in humans and machines , 2018, CogSci.

[2]  Michal Valko,et al.  Bootstrapped Representation Learning on Graphs , 2021, ArXiv.

[3]  Davide Eynard,et al.  SIGN: Scalable Inception Graph Neural Networks , 2020, ArXiv.

[4]  Xavier Bresson,et al.  Benchmarking Graph Neural Networks , 2020, ArXiv.

[5]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[6]  Qiang Liu,et al.  Deep Graph Contrastive Representation Learning , 2020, ArXiv.

[7]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[8]  Yizhou Sun,et al.  Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks , 2019, NeurIPS.

[9]  Rajgopal Kannan,et al.  GraphSAINT: Graph Sampling Based Inductive Learning Method , 2019, ICLR.

[10]  Renjie Liao,et al.  Graph Partition Neural Networks for Semi-Supervised Classification , 2018, ICLR.

[11]  Lingfan Yu,et al.  Scalable Graph Neural Networks for Heterogeneous Graphs , 2020, ArXiv.

[12]  Yong Yu,et al.  Bag of Tricks for Node Classification with Graph Neural Networks , 2021, 2103.13355.

[13]  Meire Fortunato,et al.  Learning Mesh-Based Simulation with Graph Networks , 2020, ArXiv.

[14]  Mark Chen,et al.  Language Models are Few-Shot Learners , 2020, NeurIPS.

[15]  Maho Nakata,et al.  PubChemQC Project: A Large-Scale First-Principles Electronic Structure Database for Data-Driven Chemistry , 2017, J. Chem. Inf. Model..

[16]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .

[17]  Stephan Günnemann,et al.  Pitfalls of Graph Neural Network Evaluation , 2018, ArXiv.

[18]  Jure Leskovec,et al.  Graph Convolutional Neural Networks for Web-Scale Recommender Systems , 2018, KDD.

[19]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[20]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[21]  Iryna Gurevych,et al.  Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks , 2019, EMNLP.

[22]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[23]  Michael Schaarschmidt,et al.  Very Deep Graph Neural Networks Via Noise Regularisation , 2021, ArXiv.

[24]  William L. Hamilton Graph Representation Learning , 2020, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[25]  Pushmeet Kohli,et al.  Unveiling the predictive power of static structure in glassy systems , 2020 .

[26]  Michal Valko,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[27]  Raia Hadsell,et al.  Neural Execution of Graph Algorithms , 2020, ICLR.

[28]  Pierre Vandergheynst,et al.  Geodesic Convolutional Neural Networks on Riemannian Manifolds , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[29]  Gregory A Landrum,et al.  Improving Conformer Generation for Small Rings and Macrocycles Based on Distance Geometry and Experimental Torsional-Angle Preferences , 2020, J. Chem. Inf. Model..

[30]  Kristian Kersting,et al.  TUDataset: A collection of benchmark datasets for learning with graphs , 2020, ArXiv.

[31]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[32]  Kilian Q. Weinberger,et al.  Simplifying Graph Convolutional Networks , 2019, ICML.

[33]  Emma J. Chory,et al.  A Deep Learning Approach to Antibiotic Discovery , 2020, Cell.

[34]  Andrew Tomkins,et al.  Graph Agreement Models for Semi-Supervised Learning , 2019, NeurIPS.

[35]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[36]  Marc Brockschmidt,et al.  GNN-FiLM: Graph Neural Networks with Feature-wise Linear Modulation , 2019, ICML.

[37]  Johannes Klicpera,et al.  Scaling Graph Neural Networks with Approximate PageRank , 2020, KDD.

[38]  Cao Xiao,et al.  FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling , 2018, ICLR.

[39]  J. Leskovec,et al.  Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[40]  Tingyang Xu,et al.  DropEdge: Towards Deep Graph Convolutional Networks on Node Classification , 2020, ICLR.

[41]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[42]  Max Welling,et al.  E(n) Equivariant Graph Neural Networks , 2021, ICML.

[43]  Omer Levy,et al.  RoBERTa: A Robustly Optimized BERT Pretraining Approach , 2019, ArXiv.

[44]  Jure Leskovec,et al.  OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs , 2021, NeurIPS Datasets and Benchmarks.

[45]  Yuxiao Dong,et al.  Microsoft Academic Graph: When experts are not enough , 2020, Quantitative Science Studies.

[46]  Jure Leskovec,et al.  Learning to Simulate Complex Physics with Graph Networks , 2020, ICML.

[47]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[48]  Quoc V. Le,et al.  Chip Placement with Deep Reinforcement Learning , 2020, ArXiv.

[49]  Qian Huang,et al.  Combining Label Propagation and Simple Models Out-performs Graph Neural Networks , 2020, ICLR.

[50]  Joan Bruna,et al.  Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges , 2021, ArXiv.

[51]  Samy Bengio,et al.  Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks , 2019, KDD.

[52]  W. Goddard,et al.  UFF, a full periodic table force field for molecular mechanics and molecular dynamics simulations , 1992 .

[53]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[54]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[55]  Jure Leskovec,et al.  Strategies for Pre-training Graph Neural Networks , 2020, ICLR.

[56]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[57]  Thomas A. Halgren Merck molecular force field. I. Basis, form, scope, parameterization, and performance of MMFF94 , 1996, J. Comput. Chem..

[58]  Max Welling,et al.  Gauge Equivariant Mesh CNNs: Anisotropic convolutions on geometric graphs , 2020, ICLR.

[59]  Pietro Liò,et al.  Deep Graph Infomax , 2018, ICLR.