Efficient Data Loader for Fast Sampling-Based GNN Training on Large Graphs

Emerging graph neural networks (GNNs) have extended the successes of deep learning techniques against datasets like images and texts to more complex graph-structured data. By leveraging GPU accelerators, existing frameworks combine mini-batch and sampling for effective and efficient model training on large graphs. However, this setup faces a scalability issue since loading rich vertex features from CPU to GPU through a limited bandwidth link usually dominates the training cycle. In this article, we propose PaGraph, a novel, efficient data loader that supports general and efficient sampling-based GNN training on single-server with multi-GPU. PaGraph significantly reduces the data loading time by exploiting available GPU resources to keep frequently-accessed graph data with a cache. It also embodies a lightweight yet effective caching policy that takes into account graph structural information and data access patterns of sampling-based GNN training simultaneously. Furthermore, to scale out on multiple GPUs, PaGraph develops a fast GNN-computation-aware partition algorithm to avoid cross-partition access during data-parallel training and achieves better cache efficiency. Finally, it overlaps data loading and GNN computation for further hiding loading costs. Evaluations on two representative GNN models, GCN and GraphSAGE, using two sampling methods, Neighbor and Layer-wise, show that PaGraph could eliminate the data loading time from the GNN training pipeline, and achieve up to 4.8× performance speedup over the state-of-the-art baselines. Together with preprocessing optimization, PaGraph further delivers up to 16.0× end-to-end speedup.

[1]  Ameet Talwalkar,et al.  Efficient Hyperparameter Optimization and Infinitely Many Armed Bandits , 2016, ArXiv.

[2]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[3]  Alok Tripathy,et al.  Reducing Communication in Graph Neural Network Training , 2020, ArXiv.

[4]  Chang Zhou,et al.  AliGraph: A Comprehensive Graph Neural Network Platform , 2019, Proc. VLDB Endow..

[5]  Le Song,et al.  Stochastic Training of Graph Convolutional Networks with Variance Reduction , 2017, ICML.

[6]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[7]  Jure Leskovec,et al.  Hierarchical Graph Representation Learning with Differentiable Pooling , 2018, NeurIPS.

[8]  Pablo Rodriguez,et al.  The little engine(s) that could: scaling online social networks , 2010, SIGCOMM '10.

[9]  Marek Chrobak,et al.  LRU Is Better than FIFO , 1999, SODA '98.

[10]  Yafei Dai,et al.  NeuGraph: Parallel Deep Neural Network Computation on Large Graphs , 2019, USENIX ATC.

[11]  Krishna P. Gummadi,et al.  Measurement and analysis of online social networks , 2007, IMC '07.

[12]  Xiaoye S. Li,et al.  C-SAW: A Framework for Graph Sampling and Random Walk on GPUs , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Khalil Sima'an,et al.  Graph Convolutional Encoders for Syntax-aware Neural Machine Translation , 2017, EMNLP.

[14]  Razvan Pascanu,et al.  Relational inductive biases, deep learning, and graph networks , 2018, ArXiv.

[15]  Cao Xiao,et al.  FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling , 2018, ICLR.

[16]  M. Newman Power laws, Pareto distributions and Zipf's law , 2005 .

[17]  Joan Bruna,et al.  Few-Shot Learning with Graph Neural Networks , 2017, ICLR.

[18]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[19]  Vladimir Vlassov,et al.  Streaming Graph Partitioning: An Experimental Study , 2018, Proc. VLDB Endow..

[20]  Alexander Peysakhovich,et al.  PyTorch-BigGraph: A Large-scale Graph Embedding System , 2019, SysML.

[21]  Kurt Rothermel,et al.  GrapH: Traffic-Aware Graph Processing , 2018, IEEE Transactions on Parallel and Distributed Systems.

[22]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[23]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[24]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[25]  Xuemin Lin,et al.  Speedup Graph Processing by Graph Ordering , 2016, SIGMOD Conference.

[26]  N. Metropolis,et al.  An Efficient Heuristic Procedure for Partitioning Graphs , 2017 .

[27]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[28]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[29]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[30]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[31]  Alessandro Rozza,et al.  Dynamic Graph Convolutional Networks , 2017, Pattern Recognit..

[32]  Jan Eric Lenssen,et al.  Fast Graph Representation Learning with PyTorch Geometric , 2019, ArXiv.

[33]  Tie-Yan Liu,et al.  Convergence Analysis of Distributed Stochastic Gradient Descent with Shuffling , 2017, Neurocomputing.

[34]  Luke M. Leslie,et al.  An Experimental Comparison of Partitioning Strategies in Distributed Graph Processing , 2017, Proc. VLDB Endow..

[35]  Jidong Zhai,et al.  Understanding and bridging the gaps in current GNN performance optimizations , 2021, PPoPP.

[36]  Wencong Xiao,et al.  Gandiva: Introspective Cluster Scheduling for Deep Learning , 2018, OSDI.

[37]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[38]  Christopher Ré,et al.  Toward a Noncommutative Arithmetic-geometric Mean Inequality: Conjectures, Case-studies, and Consequences , 2012, COLT.

[39]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[40]  Christopher D. Manning,et al.  Graph Convolution over Pruned Dependency Trees Improves Relation Extraction , 2018, EMNLP.

[41]  Amar Phanishayee,et al.  The Case for Unifying Data Loading in Machine Learning Clusters , 2019, HotCloud.

[42]  Michalis Faloutsos,et al.  On power-law relationships of the Internet topology , 1999, SIGCOMM '99.

[43]  Zhiyuan Liu,et al.  Graph Neural Networks: A Review of Methods and Applications , 2018, AI Open.

[44]  Zheng Zhang,et al.  MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems , 2015, ArXiv.

[45]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[46]  Alexander Aiken,et al.  Improving the Accuracy, Scalability, and Performance of Graph Neural Networks with Roc , 2020, MLSys.

[47]  Jure Leskovec,et al.  Predicting positive and negative links in online social networks , 2010, WWW '10.

[48]  Brian W. Kernighan,et al.  An efficient heuristic procedure for partitioning graphs , 1970, Bell Syst. Tech. J..

[49]  Sam H. Noh,et al.  Pre-Select Static Caching and Neighborhood Ordering for BFS-like Algorithms on Disk-based Graph Engines , 2019, USENIX Annual Technical Conference.

[50]  Junzhou Huang,et al.  Adaptive Sampling Towards Fast Graph Representation Learning , 2018, NeurIPS.

[51]  Alex Smola,et al.  Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs , 2019, ArXiv.