P3: Distributed Deep Graph Learning at Scale

Graph Neural Networks (GNNs) have gained significant attention in the recent past, and become one of the fastest growing subareas in deep learning. While several new GNN architectures have been proposed, the scale of real-world graphs—in many cases billions of nodes and edges—poses challenges during model training. In this paper, we present P3, a system that focuses on scaling GNN model training to large real-world graphs in a distributed setting. We observe that scalability challenges in training GNNs are fundamentally different from that in training classical deep neural networks and distributed graph processing; and that commonly used techniques, such as intelligent partitioning of the graph do not yield desired results. Based on this observation, P3 proposes a new approach for distributed GNN training. Our approach effectively eliminates high communication and partitioning overheads, and couples it with a new pipelined push-pull parallelism based execution strategy for fast model training. P3 exposes a simple API that captures many different classes of GNN architectures for generality. When further combined with a simple caching strategy, our evaluation shows that P3 is able to outperform existing state-of-the-art distributed GNN frameworks by up to 7×.

[1]  Alex Krizhevsky,et al.  One weird trick for parallelizing convolutional neural networks , 2014, ArXiv.

[2]  Lingfan Yu,et al.  Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks. , 2019 .

[3]  Russ B Altman,et al.  Machine learning in chemoinformatics and drug discovery. , 2018, Drug discovery today.

[4]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Keshav Pingali,et al.  A lightweight infrastructure for graph analytics , 2013, SOSP.

[6]  Jure Leskovec,et al.  PinnerSage: Multi-Modal User Embedding Framework for Recommendations at Pinterest , 2020, KDD.

[7]  Olatunji Ruwase,et al.  DeepSpeed: System Optimizations Enable Training Deep Learning Models with Over 100 Billion Parameters , 2020, KDD.

[8]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[9]  Yixin Chen,et al.  Link Prediction Based on Graph Neural Networks , 2018, NeurIPS.

[10]  Ion Stoica,et al.  TEGRA: Efficient Ad-Hoc Analytics on Evolving Graphs , 2021, NSDI.

[11]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Keval Vora,et al.  CuSha: vertex-centric graph processing on GPUs , 2014, HPDC '14.

[13]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[14]  Alexander Aiken,et al.  Improving the Accuracy, Scalability, and Performance of Graph Neural Networks with Roc , 2020, MLSys.

[15]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[16]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[17]  Nikhil R. Devanur,et al.  PipeDream: generalized pipeline parallelism for DNN training , 2019, SOSP.

[18]  Quoc V. Le,et al.  GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism , 2018, ArXiv.

[19]  Xin Jin,et al.  ASAP: Fast, Approximate Graph Pattern Mining at Scale , 2018, OSDI.

[20]  Tara N. Sainath,et al.  Deep Neural Networks for Acoustic Modeling in Speech Recognition: The Shared Views of Four Research Groups , 2012, IEEE Signal Processing Magazine.

[21]  Amar Phanishayee,et al.  Gist: Efficient Data Encoding for Deep Neural Network Training , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).

[22]  Avery Ching,et al.  One Trillion Edges: Graph Processing at Facebook-Scale , 2015, Proc. VLDB Endow..

[23]  Amine Mhedhbi,et al.  The Ubiquity of Large Graphs and Surprising Challenges of Graph Processing , 2017 .

[24]  Yang Song,et al.  An Overview of Microsoft Academic Service (MAS) and Applications , 2015, WWW.

[25]  Minjie Wang,et al.  Supporting Very Large Models using Automatic Dataflow Graph Partitioning , 2018, EuroSys.

[26]  Binyu Zang,et al.  PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs , 2019, TOPC.

[27]  Yizhou Sun,et al.  Layer-Dependent Importance Sampling for Training Deep and Large Graph Convolutional Networks , 2019, NeurIPS.

[28]  Cheng Wang,et al.  Generating Synthetic Social Graphs with Darwini , 2018, 2018 IEEE 38th International Conference on Distributed Computing Systems (ICDCS).

[29]  Christos Faloutsos,et al.  Dynamics of large networks , 2008 .

[30]  Trishul M. Chilimbi,et al.  Project Adam: Building an Efficient and Scalable Deep Learning Training System , 2014, OSDI.

[31]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[32]  John D. Owens,et al.  Gunrock: a high-performance graph processing library on the GPU , 2015, PPoPP.

[33]  Jure Leskovec,et al.  Graph Convolutional Neural Networks for Web-Scale Recommender Systems , 2018, KDD.

[34]  Ziqi Liu,et al.  AGL , 2020, Proc. VLDB Endow..

[35]  Kurt Keutzer,et al.  Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization , 2019, MLSys.

[36]  Reynold Xin,et al.  GraphX: Graph Processing in a Distributed Dataflow Framework , 2014, OSDI.

[37]  Kilian Q. Weinberger,et al.  Simplifying Graph Convolutional Networks , 2019, ICML.

[38]  Chang Zhou,et al.  AliGraph: A Comprehensive Graph Neural Network Platform , 2019, Proc. VLDB Endow..

[39]  Emma J. Chory,et al.  A Deep Learning Approach to Antibiotic Discovery , 2020, Cell.

[40]  Wencong Xiao,et al.  Analysis of Large-Scale Multi-Tenant GPU Clusters for DNN Training Workloads , 2019, USENIX Annual Technical Conference.

[41]  Weimin Zheng,et al.  Exploring the Hidden Dimension in Graph Processing , 2016, OSDI.

[42]  Sebastiano Vigna,et al.  The webgraph framework I: compression techniques , 2004, WWW '04.

[43]  Yogesh L. Simmhan,et al.  An Interval-centric Model for Distributed Computing over Temporal Graphs , 2020, 2020 IEEE 36th International Conference on Data Engineering (ICDE).

[44]  Christos Faloutsos,et al.  Estimating Node Importance in Knowledge Graphs Using Graph Neural Networks , 2019, KDD.

[45]  Yunxin Liu,et al.  PaGraph: Scaling GNN training on large graphs via computation-aware caching , 2020, SoCC.

[46]  J. Leskovec,et al.  Open Graph Benchmark: Datasets for Machine Learning on Graphs , 2020, NeurIPS.

[47]  Marco Rosa,et al.  Layered label propagation: a multiresolution coordinate-free ordering for compressing social networks , 2010, WWW.

[48]  Jure Leskovec,et al.  Representation Learning on Graphs: Methods and Applications , 2017, IEEE Data Eng. Bull..

[49]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[50]  Philip Levis,et al.  GRIP: A Graph Neural Network Accelerator Architecture , 2020, IEEE Transactions on Computers.

[51]  Jaesik Choi,et al.  HetPipe: Enabling Large DNN Training on (Whimpy) Heterogeneous GPU Clusters through Integration of Pipelined Model Parallelism and Data Parallelism , 2020, USENIX ATC.

[52]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[53]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[54]  Joseph Gonzalez,et al.  PowerGraph: Distributed Graph-Parallel Computation on Natural Graphs , 2012, OSDI.

[55]  Jan Eric Lenssen,et al.  Fast Graph Representation Learning with PyTorch Geometric , 2019, ArXiv.

[56]  Samy Bengio,et al.  Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks , 2019, KDD.

[57]  Alexander J. Smola,et al.  Scaling Distributed Machine Learning with the Parameter Server , 2014, OSDI.

[58]  Mohammad Shoeybi,et al.  Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism , 2019, ArXiv.

[59]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[60]  Davide Bacciu,et al.  Contextual Graph Markov Model: A Deep and Generative Approach to Graph Processing , 2018, ICML.

[61]  Yafei Dai,et al.  NeuGraph: Parallel Deep Neural Network Computation on Large Graphs , 2019, USENIX ATC.

[62]  Alexander Aiken,et al.  Beyond Data and Model Parallelism for Deep Neural Networks , 2018, SysML.

[63]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.