GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings

We present GNNAutoScale (GAS), a framework for scaling arbitrary message-passing GNNs to large graphs. GAS prunes entire sub-trees of the computation graph by utilizing historical embeddings from prior training iterations, leading to constant GPU memory consumption in respect to input node size without dropping any data. While existing solutions weaken the expressive power of message passing due to sub-sampling of edges or non-trainable propagations, our approach is provably able to maintain the expressive power of the original GNN. We achieve this by providing approximation error bounds of historical embeddings and show how to tighten them in practice. Empirically, we show that the practical realization of our framework, PyGAS, an easy-to-use extension for PYTORCH GEOMETRIC, is both fast and memory-efficient, learns expressive node representations, closely resembles the performance of their non-scaling counterparts, and reaches stateof-the-art performance on large-scale graphs.

[1]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[2]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[3]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[4]  Inderjit S. Dhillon,et al.  Weighted Graph Cuts without Eigenvectors A Multilevel Approach , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[6]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[7]  Ruslan Salakhutdinov,et al.  Revisiting Semi-Supervised Learning with Graph Embeddings , 2016, ICML.

[8]  Defferrard Michaël,et al.  Deep Learning on Graphs , 2016 .

[9]  Wenguang Chen,et al.  Gemini: A Computation-Centric Distributed Graph Processing System , 2016, OSDI.

[10]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[11]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[12]  Leonidas J. Guibas,et al.  PointNet++: Deep Hierarchical Feature Learning on Point Sets in a Metric Space , 2017, NIPS.

[13]  Alexander J. Smola,et al.  Deep Sets , 2017, 1703.06114.

[14]  Samuel S. Schoenholz,et al.  Neural Message Passing for Quantum Chemistry , 2017, ICML.

[15]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[16]  Ken-ichi Kawarabayashi,et al.  Representation Learning on Graphs with Jumping Knowledge Networks , 2018, ICML.

[17]  Le Song,et al.  Stochastic Training of Graph Convolutional Networks with Variance Reduction , 2017, ICML.

[18]  Junzhou Huang,et al.  Adaptive Sampling Towards Fast Graph Representation Learning , 2018, NeurIPS.

[19]  Muhammad Usama,et al.  Towards Robust Neural Networks with Lipschitz Continuity , 2018, IWDW.

[20]  Stephan Günnemann,et al.  Pitfalls of Graph Neural Network Evaluation , 2018, ArXiv.

[21]  Cao Xiao,et al.  FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling , 2018, ICLR.

[22]  Yaron Lipman,et al.  Provably Powerful Graph Networks , 2019, NeurIPS.

[23]  Samy Bengio,et al.  Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks , 2019, KDD.

[24]  Stephan Günnemann,et al.  Diffusion Improves Graph Learning , 2019, NeurIPS.

[25]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[26]  Younjoo Seo,et al.  Discriminative structural graph classification , 2019, ArXiv.

[27]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[28]  Jan Eric Lenssen,et al.  Fast Graph Representation Learning with PyTorch Geometric , 2019, ArXiv.

[29]  Martin Grohe,et al.  Weisfeiler and Leman Go Neural: Higher-order Graph Neural Networks , 2018, AAAI.

[30]  Chang Zhou,et al.  AliGraph: A Comprehensive Graph Neural Network Platform , 2019, Proc. VLDB Endow..

[31]  Kilian Q. Weinberger,et al.  Simplifying Graph Convolutional Networks , 2019, ICML.

[32]  Yafei Dai,et al.  NeuGraph: Parallel Deep Neural Network Computation on Large Graphs , 2019, USENIX ATC.

[33]  Yizhou Sun,et al.  Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks , 2019, NeurIPS.

[34]  Stephan Günnemann,et al.  Predict then Propagate: Graph Neural Networks meet Personalized PageRank , 2018, ICLR.

[35]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[36]  G. Karypis,et al.  DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs , 2020, 2020 IEEE/ACM 10th Workshop on Irregular Applications: Architectures and Algorithms (IA3).

[37]  Dominique Beaini,et al.  Principal Neighbourhood Aggregation for Graph Nets , 2020, NeurIPS.

[38]  Yaliang Li,et al.  Simple and Deep Graph Convolutional Networks , 2020, ICML.

[39]  Yoshua Bengio,et al.  Benchmarking Graph Neural Networks , 2023, J. Mach. Learn. Res..

[40]  Murali Annavaram,et al.  Distributed Training of Graph Convolutional Networks using Subgraph Approximation , 2020, ArXiv.

[41]  Rana Forsati,et al.  Minimal Variance Sampling with Provable Guarantees for Fast Training of Graph Neural Networks , 2020, KDD.

[42]  J. Leskovec,et al.  Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[43]  P'eter Mernyei,et al.  Wiki-CS: A Wikipedia-Based Benchmark for Graph Neural Networks , 2020, ArXiv.

[44]  Tianlong Chen,et al.  L^2-GCN: Layer-Wise and Learned Efficient Training of Graph Convolutional Networks , 2020, ArXiv.

[45]  Davide Eynard,et al.  SIGN: Scalable Inception Graph Neural Networks , 2020, ArXiv.

[46]  Alexander Aiken,et al.  Improving the Accuracy, Scalability, and Performance of Graph Neural Networks with Roc , 2020, MLSys.

[47]  William L. Hamilton Graph Representation Learning , 2020, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[48]  V. Prasanna,et al.  Deep Graph Neural Networks with Shallow Subgraph Samplers , 2020, ArXiv.

[49]  Yaliang Li,et al.  Scalable Graph Neural Networks via Bidirectional Propagation , 2020, NeurIPS.

[50]  Rajgopal Kannan,et al.  GraphSAINT: Graph Sampling Based Inductive Learning Method , 2019, ICLR.

[51]  Junzhou Huang,et al.  DropEdge: Towards Deep Graph Convolutional Networks on Node Classification , 2019, International Conference on Learning Representations.

[52]  Lingfan Yu,et al.  Scalable Graph Neural Networks for Heterogeneous Graphs , 2020, ArXiv.

[53]  DON’T STACK LAYERS IN GRAPH NEURAL NETWORKS, WIRE THEM RANDOMLY , 2021 .

[54]  Sami Abu-El-Haija,et al.  Graph Traversal with Tensor Functionals: A Meta-Algorithm for Scalable Learning , 2021, ICLR.

[55]  Yu Sun,et al.  Masked Label Prediction: Unified Massage Passing Model for Semi-Supervised Classification , 2020, IJCAI.

[56]  Qian Huang,et al.  Combining Label Propagation and Simple Models Out-performs Graph Neural Networks , 2020, ICLR.