Understanding GNN Computational Graph: A Coordinated Computation, IO, and Memory Perspective

Graph Neural Networks (GNNs) have been widely used in various domains, and GNNs with sophisticated computational graph lead to higher latency and larger memory consumption. Optimizing the GNN computational graphs suffers from: (1) Redundant neural operator computation. The same data are propagated through the graph structure to perform the same neural operation multiple times in GNNs, leading to redundant computation which accounts for 92.4% of total operators. (2) Inconsistent thread mapping. Efficient thread mapping schemes for vertex-centric and edge-centric operators are different. This inconsistency prohibits operator fusion to reduce memory IO. (3) Excessive intermediate data. For GNN training which is usually performed concurrently with inference, intermediate data must be stored for the backward pass, consuming 91.9% of total memory requirement. To tackle these challenges, we propose following designs to optimize the GNN computational graph from a novel coordinated computation, IO, and memory perspective: (1) Propagation-postponed operator reorganization. We reorganize operators to perform neural operations before the propagation, thus the redundant computation is eliminated. (2) Unified thread mapping for fusion. We propose a unified thread mapping scheme for both vertexand edge-centric operators to enable fusion and reduce IO. (3) Intermediate data recomputation. Intermediate data are recomputed during the backward pass to reduce the total memory consumption. Extensive experimental results on three typical GNN models show that, we achieve up to 2.75× end-to-end speedup, 6.89× less memory IO, and 7.73× less memory consumption over state-of-the-art frameworks.

[1]  Yue Wang,et al.  Dynamic Graph CNN for Learning on Point Clouds , 2018, ACM Trans. Graph..

[2]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[3]  Tianqi Chen,et al.  Training Deep Nets with Sublinear Memory Cost , 2016, ArXiv.

[4]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[5]  Jan Eric Lenssen,et al.  Fast Graph Representation Learning with PyTorch Geometric , 2019, ArXiv.

[6]  Jure Leskovec,et al.  Knowledge-aware Graph Neural Networks with Label Smoothness Regularization for Recommender Systems , 2019, KDD.

[7]  Jonathan Masci,et al.  Geometric Deep Learning on Graphs and Manifolds Using Mixture Model CNNs , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Yu Wang,et al.  GE-SpMM: General-Purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks , 2020, SC20: International Conference for High Performance Computing, Networking, Storage and Analysis.

[9]  Yafei Dai,et al.  NeuGraph: Parallel Deep Neural Network Computation on Large Graphs , 2019, USENIX ATC.

[10]  Makoto Onizuka,et al.  Rabbit Order: Just-in-Time Parallel Reordering for Fast Graph Analysis , 2016, 2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[11]  Song-Chun Zhu,et al.  Learning Human-Object Interactions by Graph Parsing Neural Networks , 2018, ECCV.

[12]  Dahua Lin,et al.  Spatial Temporal Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, AAAI.

[13]  Lei Deng,et al.  fuseGNN: Accelerating Graph Convolutional Neural Network Training on GPGPU , 2020, 2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD).

[14]  Alexander Aiken,et al.  TASO: optimizing deep learning computation with automatic generation of graph substitutions , 2019, SOSP.

[15]  Yuan Luo,et al.  Graph Convolutional Networks for Text Classification , 2018, AAAI.

[16]  Gagan Agrawal,et al.  DNNFusion: accelerating deep neural networks execution with advanced operator fusion , 2021, PLDI.

[17]  Lei Deng,et al.  GNNAdvisor: An Adaptive and Efficient Runtime System for GNN Acceleration on GPUs , 2020, OSDI.

[18]  Jidong Zhai,et al.  Understanding and bridging the gaps in current GNN performance optimizations , 2021, PPoPP.

[19]  Ralph Grishman,et al.  Graph Convolutional Networks With Argument-Aware Pooling for Event Detection , 2018, AAAI.

[20]  Alex Smola,et al.  Deep Graph Library: Towards Efficient and Scalable Deep Learning on Graphs , 2019, ArXiv.

[21]  Jianxiong Xiao,et al.  3D ShapeNets: A deep representation for volumetric shapes , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Dongrui Fan,et al.  HyGCN: A GCN Accelerator with Hybrid Architecture , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[23]  Jure Leskovec,et al.  Graph Convolutional Neural Networks for Web-Scale Recommender Systems , 2018, KDD.