Convolutional Neural Network Accelerator with Vector Quantization

Deep neural networks (DNNs) have demonstrated impressive performance in many edge computer vision tasks, causing the increasing demand for DNN accelerator on mobile and internet of things (IoT) devices. However, the massive power consumption and storage requirement make the hardware design challenging. In this paper, we introduce a DNN accelerator based on a model compression technique vector quantization (VQ), which can reduce the network model size and computation cost simultaneously. Moreover, a specialized processing element (PE) is designed with various SRAM bank configurations as well as dataflows such that it can support different codebook/kernel sizes, and keep high utilization under small input or output channel numbers. Compared to the state-of-the-art, the proposed accelerator architecture achieves 4.2 times reduction in memory access and 2.05 times throughput per cycle for batch-one inference.

[1]  Vivienne Sze,et al.  14.5 Eyeriss: An energy-efficient reconfigurable accelerator for deep convolutional neural networks , 2016, ISSCC.

[2]  Dan Alistarh,et al.  Model compression via distillation and quantization , 2018, ICLR.

[3]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[4]  Vivienne Sze,et al.  Designing Energy-Efficient Convolutional Neural Networks Using Energy-Aware Pruning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[6]  Warren J. Gross,et al.  An Architecture to Accelerate Convolution in Deep Neural Networks , 2018, IEEE Transactions on Circuits and Systems I: Regular Papers.

[7]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[8]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[9]  Kyoung Mu Lee,et al.  Accurate Image Super-Resolution Using Very Deep Convolutional Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Yurong Chen,et al.  Dynamic Network Surgery for Efficient DNNs , 2016, NIPS.

[11]  Jian Cheng,et al.  Quantized CNN: A Unified Approach to Accelerate and Compress Convolutional Networks , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Xiaoou Tang,et al.  Learning a Deep Convolutional Network for Image Super-Resolution , 2014, ECCV.

[13]  Dajiang Zhou,et al.  Chain-NN: An energy-efficient 1D chain architecture for accelerating deep convolutional neural networks , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[14]  Ming Yang,et al.  Compressing Deep Convolutional Networks using Vector Quantization , 2014, ArXiv.