论文信息 - RFC-HyPGCN: A Runtime Sparse Feature Compress Accelerator for Skeleton-Based GCNs Action Recognition Model with Hybrid Pruning

RFC-HyPGCN: A Runtime Sparse Feature Compress Accelerator for Skeleton-Based GCNs Action Recognition Model with Hybrid Pruning

Skeleton-based Graph Convolutional Networks (GCNs) models for action recognition have achieved excellent prediction accuracy in the field. However, limited by large model and computation complexity, GCNs for action recognition like 2s-AGCN have insufficient power-efficiency and throughput on GPU. Thus, the demand of model reduction and hardware acceleration for low-power GCNs action recognition application becomes continuously higher.To address challenges above, this paper proposes a runtime sparse feature compress accelerator with hybrid pruning method: RFC-HyPGCN. First, this method skips both graph and spatial convolution workloads by reorganizing the multiplication order. Following spatial convolutions channel-pruning dataflow, a coarse-grained pruning method on temporal filters is designed, together with sampling-like fine-grained pruning on time dimension. Later, we come up with an architecture where all convolutional layers are mapped on chip to pursue high throughput. To further reduce storage resource utilization, online sparse feature compress format is put forward. Features are divided and encoded into several banks according to presented format, then bank storage is split into depth-variable mini-banks. Furthermore, this work applies quantization, input-skipping and intra-PE dynamic data scheduling to accelerate the model. In experiments, proposed pruning method is conducted on 2s-AGCN, acquiring 3.0x-8.4x model compression ratio and 73.20% graph-skipping efficiency with balancing weight pruning. Implemented on Xilinx XCKU-115 FPGA, the proposed architecture has the peak performance of 1142 GOP/s and achieves up to 9.19x and 3.91x speedup over high-end GPU NVIDIA 2080Ti and NVIDIA V100, respectively. Compared with latest accelerator for action recognition GCNs models, our design reaches 22.9x speedup and 28.93% improvement on DSP efficiency.

[1] Zhize Huang,et al. An FPGA Implementation of GCN with Sparse Adjacency Matrix , 2019, 2019 IEEE 13th International Conference on ASIC (ASICON).

[2] Yaser Sheikh,et al. OpenPose: Realtime Multi-Person 2D Pose Estimation Using Part Affinity Fields , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3] L. Livi,et al. Hierarchical Representation Learning in Graph Neural Networks With Node Decimation Pooling , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[4] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.

[5] Lei Shi,et al. Two-Stream Adaptive Graph Convolutional Networks for Skeleton-Based Action Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Viktor Prasanna,et al. Hardware Acceleration of Large Scale GCN Inference , 2020, 2020 IEEE 31st International Conference on Application-specific Systems, Architectures and Processors (ASAP).

[7] S. Reinhardt,et al. AWB-GCN: A Graph Convolutional Network Accelerator with Runtime Workload Rebalancing , 2019, 2020 53rd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[8] Shengmin Jin,et al. SGCN: A Graph Sparsifier Based on Graph Convolutional Networks , 2020, PAKDD.

[9] Changsheng Xu,et al. I Know the Relationships: Zero-Shot Action Recognition via Two-Stream Graph Convolutional Networks and Knowledge Graphs , 2019, AAAI.

[10] Yifan Zhang,et al. Skeleton-Based Action Recognition With Shift Graph Convolutional Network , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11] Haibin Shen,et al. An Efficient Hardware Accelerator for Structured Sparse Convolutional Neural Networks on FPGAs , 2020, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[12] Shouyi Yin,et al. A High-performance Inference Accelerator Exploiting Patterned Sparsity in CNNs , 2020, 2020 IEEE 28th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[13] Haoyu Wang,et al. Pose Flow: Efficient Online Pose Tracking , 2018, BMVC.

[14] Liqiang Lu,et al. An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs , 2019, 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[15] Gang Wang,et al. NTU RGB+D: A Large Scale Dataset for 3D Human Activity Analysis , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16] Wei Niu,et al. PCONV: The Missing but Desirable Sparsity in DNN Weight Pruning for Real-time Execution on Mobile Devices , 2020, AAAI.