Large-Scale Representation Learning on Graphs via Bootstrapping

Self-supervised learning provides a promising path towards eliminating the need for costly label information in representation learning on graphs. However, to achieve state-of-the-art performance, methods often need large numbers of negative examples and rely on complex augmentations. This can be prohibitively expensive, especially for large graphs. To address these challenges, we introduce Bootstrapped Graph Latents (BGRL) - a graph representation learning method that learns by predicting alternative augmentations of the input. BGRL uses only simple augmentations and alleviates the need for contrasting with negative examples, and is thus scalable by design. BGRL outperforms or matches prior methods on several established benchmarks, while achieving a 2-10x reduction in memory costs. Furthermore, we show that BGRL can be scaled up to extremely large graphs with hundreds of millions of nodes in the semi-supervised regime - achieving state-of-the-art performance and improving over supervised baselines where representations are shaped only through label information. In particular, our solution centered on BGRL constituted one of the winning entries to the Open Graph Benchmark - Large Scale Challenge at KDD Cup 2021, on a graph orders of magnitudes larger than all previously available benchmarks, thus demonstrating the scalability and effectiveness of our approach.

[1]  N. Chawla,et al.  Graph Barlow Twins: A self-supervised representation learning framework for graphs , 2021, Knowl. Based Syst..

[2]  Alice H. Oh,et al.  How to Find Your Friendly Neighborhood: Graph Attention Design with Self-Supervision , 2022, ICLR.

[3]  Xueying Guo,et al.  ETA Prediction with Graph Neural Networks in Google Maps , 2021, CIKM.

[4]  David Budden,et al.  Large-scale graph representation learning with very deep GNNs and self-supervision , 2021, ArXiv.

[5]  Sarunas Girdzijauskas,et al.  Self-supervised Graph Neural Networks without explicit negative sampling , 2021, ArXiv.

[6]  Jure Leskovec,et al.  OGB-LSC: A Large-Scale Challenge for Machine Learning on Graphs , 2021, NeurIPS Datasets and Benchmarks.

[7]  Yuandong Tian,et al.  Understanding self-supervised Learning Dynamics without Contrastive Pairs , 2021, ICML.

[8]  Jianhua Tao,et al.  Self-supervised Graph Representation Learning via Bootstrapping , 2020, Neurocomputing.

[9]  Qiang Liu,et al.  Graph Contrastive Learning with Adaptive Augmentation , 2020, WWW.

[10]  Zhangyang Wang,et al.  Graph Contrastive Learning with Augmentations , 2020, NeurIPS.

[11]  Weihua Hu,et al.  The Open Catalyst 2020 (OC20) Dataset and Community Challenges , 2020, ACS Catalysis.

[12]  P'eter Mernyei,et al.  Wiki-CS: A Wikipedia-Based Benchmark for Graph Neural Networks , 2020, ArXiv.

[13]  Geoffrey E. Hinton,et al.  Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.

[14]  Pierre H. Richemond,et al.  Bootstrap Your Own Latent: A New Approach to Self-Supervised Learning , 2020, NeurIPS.

[15]  Liang Wang,et al.  Deep Graph Contrastive Representation Learning , 2020, ArXiv.

[16]  Kaveh Hassani,et al.  Contrastive Multi-View Representation Learning on Graphs , 2020, ICML.

[17]  J. Leskovec,et al.  Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[18]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[19]  Minnan Luo,et al.  Graph Representation Learning via Graphical Mutual Information Maximization , 2020, WWW.

[20]  Jian Tang,et al.  InfoGraph: Unsupervised and Semi-supervised Graph-Level Representation Learning via Mutual Information Maximization , 2019, ICLR.

[21]  Junzhou Huang,et al.  DropEdge: Towards Deep Graph Convolutional Networks on Node Classification , 2019, ICLR.

[22]  Phillip Isola,et al.  Contrastive Multiview Coding , 2019, ECCV.

[23]  J. Leskovec,et al.  Strategies for Pre-training Graph Neural Networks , 2019, ICLR.

[24]  Jure Leskovec,et al.  Improving Graph Attention Networks with Large Margin-based Constraints , 2019, ArXiv.

[25]  Kilian Q. Weinberger,et al.  Simplifying Graph Convolutional Networks , 2019, ICML.

[26]  Pietro Liò,et al.  Deep Graph Infomax , 2018, ICLR.

[27]  Yoshua Bengio,et al.  Learning deep representations by mutual information estimation and maximization , 2018, ICLR.

[28]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[29]  Stephan Günnemann,et al.  Pitfalls of Graph Neural Network Evaluation , 2018, ArXiv.

[30]  Yoshua Bengio,et al.  Mutual Information Neural Estimation , 2018, ICML.

[31]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[32]  Stephan Günnemann,et al.  Deep Gaussian Embedding of Graphs: Unsupervised Inductive Learning via Ranking , 2017, ICLR.

[33]  Mathias Niepert,et al.  Learning Graph Representations with Embedding Propagation , 2017, NIPS.

[34]  Jure Leskovec,et al.  Predicting multicellular function through multi-layer tissue networks , 2017, Bioinform..

[35]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[36]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[37]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[38]  Max Welling,et al.  Variational Graph Auto-Encoders , 2016, ArXiv.

[39]  Jure Leskovec,et al.  node2vec: Scalable Feature Learning for Networks , 2016, KDD.

[40]  Sepp Hochreiter,et al.  Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs) , 2015, ICLR.

[41]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[42]  Yang Song,et al.  An Overview of Microsoft Academic Service (MAS) and Applications , 2015, WWW.

[43]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[44]  Jian Sun,et al.  Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[45]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[46]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[47]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[48]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[49]  Yoshua Bengio,et al.  Understanding the difficulty of training deep feedforward neural networks , 2010, AISTATS.

[50]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[52]  Zoubin Ghahramani,et al.  Learning from labeled and unlabeled data with label propagation , 2002 .