PAI-FCNN: FPGA Based Inference System for Complex CNN Models

Convolutional Neural Network (CNN) models are becoming complex with advanced OPs and structures, which introduces design challenges for FPGA-based system. In this paper, we present the design of an FPGA-based CNN inference system, PAI-FCNN, to support modern complex CNN models. PAI-FCNN consists of scalable hardware design and a model reconstruction flow in software compiler. In this way, advanced OPs like Deconv, Conv with upsampling, Dilated Conv, Concatenation can be processed by PAI-FCNN with high performance and hardware efficiency. PAI-FCNN also incorporates reduced precision to boost computing capacity, and the emerging CNN-RNN (Recurrent Neural Network) hybrid models are supported. Our experiments on both PC and embedded FPGA platforms show that the system consistently performs in an efficient manner. PAI-FCNN achieves better throughput and power efficiency than GPU solutions.

[1]  Xi Chen,et al.  FP-DNN: An Automated Framework for Mapping Deep Neural Networks onto FPGAs with RTL-HLS Hybrid Templates , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[2]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Jason Cong,et al.  Caffeine: Towards uniformed representation and acceleration for deep convolutional neural networks , 2016, 2016 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[4]  Eugenio Culurciello,et al.  ENet: A Deep Neural Network Architecture for Real-Time Semantic Segmentation , 2016, ArXiv.

[5]  Shengen Yan,et al.  Exploring heterogeneous algorithms for accelerating deep convolutional neural networks on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[6]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8]  Frank Puppe,et al.  Calamari - A High-Performance Tensorflow-based Deep Learning Package for Optical Character Recognition , 2018, Digit. Humanit. Q..

[9]  Zelong Wang,et al.  Towards a Uniform Template-based Architecture for Accelerating 2D and 3D CNNs on FPGA , 2018, FPGA.

[10]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[11]  Houqiang Li,et al.  Quantization Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Nam Sung Kim,et al.  FlexiGAN: An End-to-End Solution for FPGA Acceleration of Generative Adversarial Networks , 2018, 2018 IEEE 26th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[13]  Thomas Brox,et al.  U-Net: Convolutional Networks for Biomedical Image Segmentation , 2015, MICCAI.

[14]  Peng Zhang,et al.  Automated systolic array architecture synthesis for high throughput CNN inference on FPGAs , 2017, 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC).

[15]  Yu Wang,et al.  Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[16]  Jinjun Xiong,et al.  DNNBuilder: an Automated Tool for Building High-Performance DNN Hardware Accelerators for FPGAs , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[17]  Andrew C. Ling,et al.  An OpenCL™ Deep Learning Accelerator on Arria 10 , 2017, FPGA.

[18]  Igor Carron,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016 .

[19]  Yu Cao,et al.  Throughput-Optimized OpenCL-based FPGA Accelerator for Large-Scale Convolutional Neural Networks , 2016, FPGA.

[20]  Shengen Yan,et al.  Evaluating Fast Algorithms for Convolutional Neural Networks on FPGAs , 2017, 2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[21]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Xiang Bai,et al.  An End-to-End Trainable Neural Network for Image-Based Sequence Recognition and Its Application to Scene Text Recognition , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Jing Li,et al.  Improving the Performance of OpenCL-based FPGA Accelerator for Convolutional Neural Network , 2017, FPGA.

[24]  Liqiang Lu,et al.  An Efficient Hardware Accelerator for Sparse Convolutional Neural Networks on FPGAs , 2019, 2019 IEEE 27th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM).

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).