Learning Large Graph Property Prediction via Graph Segment Training

Learning to predict properties of large graphs is challenging because each prediction requires the knowledge of an entire graph, while the amount of memory available during training is bounded. Here we propose Graph Segment Training (GST), a general framework that utilizes a divide-and-conquer approach to allow learning large graph property prediction with a constant memory footprint. GST first divides a large graph into segments and then backpropagates through only a few segments sampled per training iteration. We refine the GST paradigm by introducing a historical embedding table to efficiently obtain embeddings for segments not sampled for backpropagation. To mitigate the staleness of historical embeddings, we design two novel techniques. First, we finetune the prediction head to fix the input distribution shift. Second, we introduce Stale Embedding Dropout to drop some stale embeddings during training to reduce bias. We evaluate our complete method GST-EFD (with all the techniques together) on two large graph property prediction benchmarks: MalNet and TpuGraphs. Our experiments show that GST-EFD is both memory-efficient and fast, while offering a slight boost on test accuracy over a typical full graph training regime.

[1]  Tianbao Yang,et al.  GraphFM: Improving Large-Scale GNN Training via Feature Momentum , 2022, ICML.

[2]  Vijay Prakash Dwivedi,et al.  Recipe for a General, Powerful, Scalable Graph Transformer , 2022, NeurIPS.

[3]  Youjie Li,et al.  BNS-GCN: Efficient Full-Graph Training of Graph Convolutional Networks with Partition-Parallelism and Random Boundary Node Sampling , 2022, MLSys.

[4]  Cameron R. Wolfe,et al.  PipeGCN: Efficient Full-Graph Training of Graph Convolutional Networks with Pipelined Feature Communication , 2022, ICLR.

[5]  Di He,et al.  Benchmarking Graphormer on Large-Scale Molecular Modeling Datasets , 2022, ArXiv.

[6]  Yu Wang,et al.  Understanding GNN Computational Graph: A Coordinated Computation, IO, and Memory Perspective , 2021, MLSys.

[7]  Jure Leskovec,et al.  GNNAutoScale: Scalable and Expressive Graph Neural Networks via Historical Embeddings , 2021, ICML.

[8]  Di He,et al.  Do Transformers Really Perform Bad for Graph Representation? , 2021, ArXiv.

[9]  Sami Abu-El-Haija,et al.  Graph Traversal with Tensor Functionals: A Meta-Algorithm for Scalable Learning , 2021, ICLR.

[10]  J. Leskovec,et al.  Design Space for Graph Neural Networks , 2020, NeurIPS.

[11]  Duen Horng Chau,et al.  A Large-Scale Database for Graph Representation Learning , 2020, NeurIPS Datasets and Benchmarks.

[12]  Aleksandar Bojchevski,et al.  Scaling Graph Neural Networks with Approximate PageRank , 2020, KDD.

[13]  David Patterson,et al.  A domain-specific supercomputer for training deep neural networks , 2020, Commun. ACM.

[14]  Christopher Ré,et al.  Machine Learning on Graphs: A Model and Comprehensive Taxonomy , 2020, J. Mach. Learn. Res..

[15]  J. Leskovec,et al.  Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[16]  Alexander Aiken,et al.  Improving the Accuracy, Scalability, and Performance of Graph Neural Networks with Roc , 2020, MLSys.

[17]  S. Kakade,et al.  The Implicit and Explicit Regularization Effects of Dropout , 2020, ICML.

[18]  Rajgopal Kannan,et al.  GraphSAINT: Graph Sampling Based Inductive Learning Method , 2019, ICLR.

[19]  Samy Bengio,et al.  Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks , 2019, KDD.

[20]  Miltiadis Allamanis,et al.  The adverse effects of code duplication in machine learning models of code , 2018, Onward!.

[21]  Damian Szklarczyk,et al.  STRING v11: protein–protein association networks with increased coverage, supporting functional discovery in genome-wide experimental datasets , 2018, Nucleic Acids Res..

[22]  M. Feldman,et al.  Supplementary Information for Evolution of resilience in protein interactomes across the tree of life , 2019 .

[23]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[24]  Junzhou Huang,et al.  Adaptive Sampling Towards Fast Graph Representation Learning , 2018, NeurIPS.

[25]  Thierry Moreau,et al.  Learning to Optimize Tensor Programs , 2018, NeurIPS.

[26]  Cao Xiao,et al.  FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling , 2018, ICLR.

[27]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[28]  Le Song,et al.  Stochastic Training of Graph Convolutional Networks with Variance Reduction , 2017, ICML.

[29]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[30]  Svetha Venkatesh,et al.  Graph Classification via Deep Learning with Virtual Nodes , 2017, ArXiv.

[31]  Zhenguo Li,et al.  Graph Edge Partitioning via Neighborhood Heuristic , 2017, KDD.

[32]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[33]  Vijay S. Pande,et al.  MoleculeNet: a benchmark for molecular machine learning , 2017, Chemical science.

[34]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[35]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[36]  Zhihua Zhang,et al.  Distributed Power-law Graph Computing: Theoretical and Empirical Analysis , 2014, NIPS.

[37]  Yoshua Bengio,et al.  Benchmarking Graph Neural Networks , 2023, J. Mach. Learn. Res..

[38]  Tianqi Chen,et al.  TenSet: A Large-scale Program Performance Dataset for Learned Tensor Compilers , 2021, NeurIPS Datasets and Benchmarks.

[39]  Samuel J. Kaufman,et al.  A Learned Performance Model for Tensor Processing Units , 2021, MLSys.