Accelerating Large Scale GCN Inference on FPGA
暂无分享,去创建一个
We propose an algorithm-architecture co-optimization framework to accelerate large-scale GCN inference on FPGA. We first perform data partitioning to fit each partition in the limited on-chip memory of FPGA. Then, we use the two-phase pre-processing algorithm consisting of sparsification and node reordering. The first phase (sparsification) eliminates edge connections of high-degree nodes by merging common neighbor nodes. The second phase (re-ordering) effectively groups densely connected neighborhoods to improve on-chip data reuse. Incorporating the above algorithmic optimizations, we propose an FPGA architecture to efficiently execute the two key computational kernels of GCN - feature aggregation and weight transformation. We evaluate our design on a state-of-the-art FPGA device. Compared with multi-core and GPU baselines, our design reduces the inference latency by up to $30 \times $ and $2 \times $ respectively.
[1] Max Welling,et al. Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.
[2] E. Cuthill,et al. Reducing the bandwidth of sparse symmetric matrices , 1969, ACM '69.
[3] Philip S. Yu,et al. A Comprehensive Survey on Graph Neural Networks , 2019, IEEE Transactions on Neural Networks and Learning Systems.
[4] Viktor Prasanna,et al. GraphACT: Accelerating GCN Training on CPU-FPGA Heterogeneous Platforms , 2019, FPGA.