Cross-layer CNN Approximations for Hardware Implementation

Convolution Neural Networks (CNNs) are widely used for image classification and object detection applications. The deployment of these architectures in embedded applications is a great challenge. This challenge arises from CNNs’ high computation complexity that is required to be implemented on platforms with limited hardware resources like FPGA. Since these applications are inherently error-resilient, approximate computing (AC) offers an interesting trade-off between resource utilization and accuracy. In this paper, we study the impact on CNN performances when several approximation techniques are applied simultaneously. We focus on two of the widely used approximation techniques, namely quantization and pruning. Our experimental results showed that for CNN networks of different parameter sizes and 3% loss in accuracy, we can obtain up to 27.9%–47.2% reduction in computation complexity in terms of FLOPs for CIFAR-10 and MNIST datasets.

[1]  Yoshua Bengio,et al.  BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 , 2016, ArXiv.

[2]  Song Han,et al.  Learning both Weights and Connections for Efficient Neural Network , 2015, NIPS.

[3]  Timo Aila,et al.  Pruning Convolutional Neural Networks for Resource Efficient Transfer Learning , 2016, ArXiv.

[4]  Natalie D. Enright Jerger,et al.  Cnvlutin: Ineffectual-Neuron-Free Deep Neural Network Computing , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).

[5]  Vivienne Sze,et al.  Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Ali Farhadi,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[7]  Yoshua Bengio,et al.  BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.

[8]  Daisuke Miyashita,et al.  Convolutional Neural Networks using Logarithmic Data Representation , 2016, ArXiv.

[9]  Xiangyu Zhang,et al.  Channel Pruning for Accelerating Very Deep Neural Networks , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[10]  Joel Emer,et al.  Eyeriss: an Energy-efficient Reconfigurable Accelerator for Deep Convolutional Neural Networks Accessed Terms of Use , 2022 .

[11]  Yu Cao,et al.  Scalable and modularized RTL compilation of Convolutional Neural Networks onto FPGA , 2016, 2016 26th International Conference on Field Programmable Logic and Applications (FPL).

[12]  Soheil Ghiasi,et al.  Hardware-oriented Approximation of Convolutional Neural Networks , 2016, ArXiv.

[13]  Maxime Pelcat,et al.  Accelerating CNN inference on FPGAs: A Survey , 2018, ArXiv.

[14]  Yixin Chen,et al.  Compressing Neural Networks with the Hashing Trick , 2015, ICML.

[15]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.