论文信息 - PAI-FCNN: FPGA Based Inference System for Complex CNN Models

PAI-FCNN: FPGA Based Inference System for Complex CNN Models

Convolutional Neural Network (CNN) models are becoming complex with advanced OPs and structures, which introduces design challenges for FPGA-based system. In this paper, we present the design of an FPGA-based CNN inference system, PAI-FCNN, to support modern complex CNN models. PAI-FCNN consists of scalable hardware design and a model reconstruction flow in software compiler. In this way, advanced OPs like Deconv, Conv with upsampling, Dilated Conv, Concatenation can be processed by PAI-FCNN with high performance and hardware efficiency. PAI-FCNN also incorporates reduced precision to boost computing capacity, and the emerging CNN-RNN (Recurrent Neural Network) hybrid models are supported. Our experiments on both PC and embedded FPGA platforms show that the system consistently performs in an efficient manner. PAI-FCNN achieves better throughput and power efficiency than GPU solutions.

[1] Xi Chen,et al. FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[2] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Jason Cong,et al. Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[4] Eugenio Culurciello,et al. ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[5] Shengen Yan,et al. Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[6] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8] Frank Puppe,et al. Calamari - A High-Performance Tensorflow-based Deep Learning Package for Optical Character Recognition , 2018, Digit. Humanit. Q..

[9] Zelong Wang,et al. Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA , 2018, FPGA.

[10] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[11] Houqiang Li,et al. Quantization Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12] Nam Sung Kim,et al. FlexiGAN: An End-to-End Solution for FPGA Acceleration of Generative Adversarial Networks , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[13] Thomas Brox,et al. U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[14] Peng Zhang,et al. Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[15] Yu Wang,et al. Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[16] Jinjun Xiong,et al. DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[17] Andrew C. Ling,et al. An OpenCL™ Deep Learning Accelerator on Arria 10 , 2017, FPGA.

[18] Igor Carron,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016 .

[19] Yu Cao,et al. Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks , 2016, FPGA.

[20] Shengen Yan,et al. Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[21] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Xiang Bai,et al. An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Jing Li,et al. Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network , 2017, FPGA.

[24] Liqiang Lu,et al. An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs , 2019, 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[25] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).