Hardware Acceleration of Large Scale GCN Inference

Graph Convolutional Networks (GCNs) have become state-of-the-art deep learning models for representation learning on graphs. Hardware acceleration of GCN inference is challenging due to: 1) massive size of the input graph, 2) heterogeneous workload of the GCN inference that consists of sparse and dense matrix operations, and 3) irregular information propagation along the edges during the computation. To address the above challenges, we propose the algorithm-architecture co-optimization to accelerate large-scale GCN inference on FPGA. We first perform data partitioning to fit each partition in the limited on-chip memory. Then, we use a two-phase preprocessing algorithm consisting of sparsification and node reordering. The first phase (sparsification) eliminates edge connections of high-degree nodes by merging common neighbor nodes. The second phase (re-ordering) effectively groups adjacent nodes to improve on-chip data reuse. Incorporating the above algorithmic optimizations, we propose a generic FPGA architecture to pipeline the two major computational kernels in GCN: aggregation and transformation. The flexible data path and task scheduling strategy of our design support various GCN models and lead to high throughput inference. We evaluate our design on state-of-the-art FPGA platform using three large scale datasets: Flickr, Reddit, Yelp. Compared with the state-of-the-art multi-core and GPU baselines, our design improves the throughput by up to $30 \times$ and $2 \times$ respectively.

[1]  Chang Zhou,et al.  AliGraph: A Comprehensive Graph Neural Network Platform , 2019, Proc. VLDB Endow..

[2]  Samy Bengio,et al.  Cluster-GCN: An Efficient Algorithm for Training Deep and Large Graph Convolutional Networks , 2019, KDD.

[3]  Philip S. Yu,et al.  A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[4]  Viktor Prasanna,et al.  GraphACT: Accelerating GCN Training on CPU-FPGA Heterogeneous Platforms , 2019, FPGA.

[5]  Tianqi Wang,et al.  UWB-GCN: Hardware Acceleration of Graph-Convolution-Network through Runtime Workload Rebalancing , 2019, ArXiv.

[6]  Cyrus Shahabi,et al.  Diffusion Convolutional Recurrent Neural Network: Data-Driven Traffic Forecasting , 2017, ICLR.

[7]  Jure Leskovec,et al.  Graph Convolutional Neural Networks for Web-Scale Recommender Systems , 2018, KDD.

[8]  Jianwei Yang,et al.  A Real-Time and Hardware-Efficient Processor for Skeleton-Based Action Recognition With Lightweight Convolutional Neural Network , 2019, IEEE Transactions on Circuits and Systems II: Express Briefs.

[9]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[10]  E. Cuthill,et al.  Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.

[11]  Cao Xiao,et al.  FastGCN: Fast Learning with Graph Convolutional Networks via Importance Sampling , 2018, ICLR.

[12]  Jure Leskovec,et al.  Predicting multicellular function through multi-layer tissue networks , 2017, Bioinform..

[13]  Jieping Ye,et al.  Spatiotemporal Multi-Graph Convolution Network for Ride-Hailing Demand Forecasting , 2019, AAAI.

[14]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[15]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[16]  Viktor K. Prasanna,et al.  HitGraph: High-throughput Graph Processing Framework on FPGA , 2019, IEEE Transactions on Parallel and Distributed Systems.

[17]  Yuan Meng,et al.  Accelerating Proximal Policy Optimization on CPU-FPGA Heterogeneous Platforms , 2020, 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[18]  Junzhou Huang,et al.  Adaptive Sampling Towards Fast Graph Representation Learning , 2018, NeurIPS.

[19]  Le Song,et al.  Stochastic Training of Graph Convolutional Networks with Variance Reduction , 2017, ICML.

[20]  Viktor K. Prasanna,et al.  Accurate, Efficient and Scalable Graph Embedding , 2018, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[21]  Rajgopal Kannan,et al.  GraphSAINT: Graph Sampling Based Inductive Learning Method , 2019, ICLR.