SparkNoC: An energy-efficiency FPGA-based accelerator using optimized lightweight CNN for edge computing

Abstract Over the past few years, Convolution Neural Networks (CNN) have been extensively adopted in broad AI applications and have achieved noticeable effect. Deploying the feedforward inference of CNN on edge devices has now been considered a research hotspot in Edge Computing. In terms of the mobile embedded devices that exhibit constrained resources and power budget, the considerable parameters and computational bottlenecks raised rigorous requirements of deploying the CNN feedforward inference. To address this challenge, the present study develops a lightweight neural network architecture termed as SparkNet, capable of significantly reducing the weight parameters and computation demands. The feasibility of the SparkNet is verified on four datasets, i.e., MINIST, CIFAR-10, CIFAR-100 and SVHN. Besides, the SparkNet is reported exhibiting the ability to effectively compress the convolutional neural network by a factor of 150x. Compared with GPU and ASIC, an FPGA-based accelerator exhibits obvious advantages for its reconfigurable property, flexibility, power efficiency, as well as massive parallelism. Moreover, the network model of the SparkNet and the proposed accelerator architecture are both specifically built for FPGA. The SparkNet on chip (SparkNOC) that maps all the layers of the network to their own dedicated hardware unit for simultaneous pipelined work has been implemented on FPGA. The proposals of this study are assessed by deploying SparkNet model on Intel Arria 10 GX1150 FPGA platform. As revealed from the experimental results, the fully pipelined CNN hardware accelerator achieves 337.2 GOP/s performance under the energy efficiency of 44.48 GOP/s/w, indicating that it outperforms the previous methods.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Ming-Hwa Sheu,et al.  Implementation of FPGA-based Accelerator for Deep Neural Networks , 2019, 2019 IEEE 22nd International Symposium on Design and Diagnostics of Electronic Circuits & Systems (DDECS).

[3]  Yu Cao,et al.  An automatic RTL compiler for high-throughput FPGA implementation of diverse deep convolutional neural networks , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[4]  Luciano Lavagno,et al.  Synetgy: Algorithm-hardware Co-design for ConvNet Accelerators on Embedded FPGAs , 2018, FPGA.

[5]  Li Tian,et al.  Designing efficient accelerator of depthwise separable convolutional neural network on FPGA , 2019, J. Syst. Archit..

[6]  Bo Chen,et al.  MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications , 2017, ArXiv.

[7]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[8]  Taras Iakymchuk,et al.  A Novel Systolic Parallel Hardware Architecture for the FPGA Acceleration of Feedforward Neural Networks , 2019, IEEE Access.

[9]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[10]  Xu Chen,et al.  Edge Intelligence: Paving the Last Mile of Artificial Intelligence With Edge Computing , 2019, Proceedings of the IEEE.

[11]  Ali Farhadi,et al.  You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[13]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[14]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[15]  Hei Law,et al.  CornerNet: Detecting Objects as Paired Keypoints , 2018, ECCV.

[16]  Ahmad Shawahna,et al.  FPGA-Based Accelerators of Deep Learning Networks for Learning and Classification: A Review , 2019, IEEE Access.

[17]  Vincent Vanhoucke,et al.  Improving the speed of neural networks on CPUs , 2011 .

[18]  Felix Winterstein,et al.  Scavenger: Automating the construction of application-optimized memory hierarchies , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[19]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[20]  Yu Wang,et al.  Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[21]  Chao Wang,et al.  MALOC: A Fully Pipelined FPGA Accelerator for Convolutional Neural Networks With All Layers Mapped on Chip , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[22]  Eunhyeok Park,et al.  Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications , 2015, ICLR.

[23]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[24]  Shijie Li,et al.  Throughput-Optimized FPGA Accelerator for Deep Convolutional Neural Networks , 2017, ACM Trans. Reconfigurable Technol. Syst..

[25]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[26]  Sparsh Mittal,et al.  A survey on modeling and improving reliability of DNN algorithms and accelerators , 2020, J. Syst. Archit..

[27]  Jason Cong,et al.  Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[28]  Xuegong Zhou,et al.  A high performance FPGA-based accelerator for large-scale convolutional neural networks , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[29]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[31]  Song Han,et al.  EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[32]  Tara N. Sainath,et al.  Convolutional, Long Short-Term Memory, fully connected Deep Neural Networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[33]  Warren J. Gross,et al.  An Architecture to Accelerate Convolution in Deep Neural Networks , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[34]  Rob Fergus,et al.  Stochastic Pooling for Regularization of Deep Convolutional Neural Networks , 2013, ICLR.

[35]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.