论文信息 - Quantized Guided Pruning for Efficient Hardware Implementations of Convolutional Neural Networks

Quantized Guided Pruning for Efficient Hardware Implementations of Convolutional Neural Networks

Convolutional Neural Networks (CNNs) are state-of-the-art in numerous computer vision tasks such as object classification and detection. However, the large amount of parameters they contain leads to a high computational complexity and strongly limits their usability in budget-constrained devices such as embedded devices. In this paper, we propose a combination of a new pruning technique and a quantization scheme that effectively reduce the complexity and memory usage of convolutional layers of CNNs, and replace the complex convolutional operation by a low-cost multiplexer. We perform experiments on the CIFAR10, CIFAR100 and SVHN and show that the proposed method achieves almost state-of-the-art accuracy, while drastically reducing the computational and memory footprints. We also propose an efficient hardware architecture to accelerate CNN operations. The proposed hardware architecture is a pipeline and accommodates multiple layers working at the same time to speed up the inference process.

Vincent Gripon | Yoshua Bengio | Matthieu Arzel | Ghouthi Boukli Hacene | Nicolas Farrugia

[1] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[2] Arash Ardakani,et al. Sparsely-Connected Neural Networks: Towards Efficient VLSI Implementation of Deep Neural Networks , 2016, ICLR.

[3] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4] Luca Benini,et al. YodaNN: An Ultra-Low Power Convolutional Neural Network Accelerator Based on Binary Weights , 2016, 2016 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[5] Ran El-Yaniv,et al. Binarized Neural Networks , 2016, NIPS.

[6] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7] Bin Liu,et al. Ternary Weight Networks , 2016, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8] James T. Kwok,et al. Loss-aware Binarization of Deep Networks , 2016, ICLR.

[9] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[10] Ali Farhadi,et al. XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016, ECCV.

[11] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[12] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[13] Shuchang Zhou,et al. DoReFa-Net: Training Low Bitwidth Convolutional Neural Networks with Low Bitwidth Gradients , 2016, ArXiv.

[14] Kilian Q. Weinberger,et al. Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Arash Ardakani,et al. A Convolutional Accelerator for Neural Networks With Binary Weights , 2018, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[16] Vincent Gripon,et al. Compression of Deep Neural Networks on the Fly , 2015, ICANN.

[17] Gang Hua,et al. How to Train a Compact Binary Neural Network with High Accuracy? , 2017, AAAI.

[18] Yoshua Bengio,et al. Neural Networks with Few Multiplications , 2015, ICLR.

[19] Mark Sandler,et al. MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20] Eunhyeok Park,et al. Compression of Deep Convolutional Neural Networks for Fast and Low Power Mobile Applications , 2015, ICLR.

[21] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Nikos Komodakis,et al. Wide Residual Networks , 2016, BMVC.

[23] Benjamin Graham,et al. Fractional Max-Pooling , 2014, ArXiv.

[24] Yoshua Bengio,et al. BinaryConnect: Training Deep Neural Networks with binary weights during propagations , 2015, NIPS.