NeutronStar: Distributed GNN Training with Hybrid Dependency Management

GNN's training needs to resolve issues of vertex dependencies, i.e., each vertex representation's update depends on its neighbors. Existing distributed GNN systems adopt either a dependencies-cached approach or a dependencies-communicated approach. Having made intensive experiments and analysis, we find that a decision to choose one or the other approach for the best performance is determined by a set of factors, including graph inputs, model configurations, and an underlying computing cluster environment. If various GNN trainings are supported solely by one approach, the performance results are often suboptimal. We study related factors for each GNN training before its execution to choose the best-fit approach accordingly. We propose a hybrid dependency-handling approach that adaptively takes the merits of the two approaches at runtime. Based on the hybrid approach, we further develop a distributed GNN training system called NeutronStar, which makes high performance GNN trainings in an automatic way. NeutronStar is also empowered by effective optimizations in CPU-GPU computation and data processing. Our experimental results on 16-node Aliyun cluster demonstrate that NeutronStar achieves 1.81X-14.25X speedup over existing GNN systems including DistDGL and ROC.

[1]  Fei Sun,et al.  Graph Neural Networks in Recommender Systems: A Survey , 2020, ACM Comput. Surv..

[2]  Lei Zou,et al.  Accelerating Triangle Counting on GPU , 2021, SIGMOD Conference.

[3]  Miryung Kim,et al.  Dorylus: Affordable, Scalable, and Accurate GNN Training with Distributed CPU Servers and Serverless Threads , 2021, OSDI.

[4]  James Cheng,et al.  DGCL: an efficient communication library for distributed GNN training , 2021, EuroSys.

[5]  James Cheng,et al.  Seastar: vertex-centric programming for graph neural networks , 2021, EuroSys.

[6]  Wenyuan Yu,et al.  FlexGraph: a flexible and efficient distributed framework for GNN training , 2021, EuroSys.

[7]  Yongchao Liu,et al.  GraphTheta: A Distributed Graph Neural Network Learning System With Flexible Training Strategy , 2021, ArXiv.

[8]  Dhiraj D. Kalamkar,et al.  DistGNN: Scalable Distributed Training for Large-Scale Graph Neural Networks , 2021, SC21: International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Wen-mei W. Hwu,et al.  Large Graph Convolutional Network Training with GPU-Oriented Data Communication Architecture , 2021, Proc. VLDB Endow..

[10]  Lei Deng,et al.  GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUs , 2020, OSDI.

[11]  Lei Chen Deep Learning and Practice with MindSpore , 2021 .

[12]  Anand Padmanabha Iyer,et al.  P3: Distributed Deep Graph Learning at Scale , 2021, OSDI.

[13]  G. Karypis,et al.  DistDGL: Distributed Graph Neural Network Training for Billion-Scale Graphs , 2020, 2020 IEEE/ACM 10th Workshop on Irregular Applications: Architectures and Algorithms (IA3).

[14]  Bingsheng He,et al.  G3 , 2020, Proc. VLDB Endow..

[15]  Shen Li,et al.  PyTorch distributed , 2020, Proc. VLDB Endow..

[16]  Ge Yu,et al.  Automating Incremental and Asynchronous Evaluation for Recursive Aggregate Data Processing , 2020, SIGMOD Conference.

[17]  K. Yelick,et al.  Reducing Communication in Graph Neural Network Training , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.

[18]  Alexander Aiken,et al.  Improving the Accuracy, Scalability, and Performance of Graph Neural Networks with Roc , 2020, MLSys.

[19]  Ziqi Liu,et al.  AGL , 2020, Proc. VLDB Endow..

[20]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[21]  Douwe Kiela,et al.  Hyperbolic Graph Neural Networks , 2019, NeurIPS.

[22]  Alex Smola,et al.  Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs , 2019, ArXiv.

[23]  Yafei Dai,et al.  NeuGraph: Parallel Deep Neural Network Computation on Large Graphs , 2019, USENIX ATC.

[24]  Jan Eric Lenssen,et al.  Fast Graph Representation Learning with PyTorch Geometric , 2019, ArXiv.

[25]  Chang Zhou,et al.  AliGraph: A Comprehensive Graph Neural Network Platform , 2019, Proc. VLDB Endow..

[26]  Hao Wang,et al.  SEP-graph: finding shortest execution paths for graph processing under a hybrid framework on GPU , 2019, PPoPP.

[27]  Binyu Zang,et al.  PowerLyra: Differentiated Graph Computation and Partitioning on Skewed Graphs , 2019, TOPC.

[28]  Jure Leskovec,et al.  How Powerful are Graph Neural Networks? , 2018, ICLR.

[29]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[30]  Pietro Liò,et al.  Graph Attention Networks , 2017, ICLR.

[31]  Yinghui Wu,et al.  Parallelizing Sequential Graph Computations , 2018, ACM Trans. Database Syst..

[32]  Alexander Aiken,et al.  A Distributed Multi-GPU System for Fast Graph Processing , 2017, Proc. VLDB Endow..

[33]  Zhenguo Li,et al.  Graph Edge Partitioning via Neighborhood Heuristic , 2017, KDD.

[34]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[35]  Diego Marcheggiani,et al.  Encoding Sentences with Graph Convolutional Networks for Semantic Role Labeling , 2017, EMNLP.

[36]  Sivasankaran Rajamanickam,et al.  Partitioning Trillion-Edge Graphs in Minutes , 2016, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[37]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[38]  Wenguang Chen,et al.  Gemini: A Computation-Centric Distributed Graph Processing System , 2016, OSDI.

[39]  Xavier Bresson,et al.  Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering , 2016, NIPS.

[40]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[41]  Richard S. Zemel,et al.  Gated Graph Sequence Neural Networks , 2015, ICLR.

[42]  Charalampos E. Tsourakakis,et al.  FENNEL: streaming graph partitioning for massive scale graphs , 2014, WSDM.

[43]  Jérôme Kunegis,et al.  KONECT: the Koblenz network collection , 2013, WWW.

[44]  Jure Leskovec,et al.  Defining and evaluating network communities based on ground-truth , 2012, Knowledge and Information Systems.

[45]  L. Takac DATA ANALYSIS IN PUBLIC SOCIAL NETWORKS , 2012 .

[46]  Hosung Park,et al.  What is Twitter, a social network or a news media? , 2010, WWW '10.

[47]  Jure Leskovec,et al.  Community Structure in Large Networks: Natural Cluster Sizes and the Absence of Large Well-Defined Clusters , 2008, Internet Math..

[48]  Lise Getoor,et al.  Collective Classification in Network Data , 2008, AI Mag..

[49]  Jon M. Kleinberg,et al.  Group formation in large social networks: membership, growth, and evolution , 2006, KDD '06.

[50]  Vipin Kumar,et al.  A Fast and High Quality Multilevel Scheme for Partitioning Irregular Graphs , 1998, SIAM J. Sci. Comput..

[51]  Vipin Kumar,et al.  Parallel multilevel graph partitioning , 1996, Proceedings of International Conference on Parallel Processing.