Arbitrary-Precision Convolutional Neural Networks on Low-Power IoT Processors

The deployment of Convolutional Neural Networks (CNNs) on resource-constrained IoT devices calls for accurate model re-sizing and optimization. Among the proposed compression strategies, n-ary fixed-point quantization has proven effective in reducing both computational effort and memory footprint with no (or limited) accuracy loss. However, its use requires custom components and special memory allocation strategies which are not available and burdensome to implement on low-power/low-cost cores. In order to bridge this gap, this work introduces Virtual Quantization (VQ), a hardware-friendly compression method which allows to implement equivalent n ary CNNs on general purpose instruction-set architectures. The proposed VQ framework is validated for the IoT family of ARM MCUs (ARM Cortex-M) and tested with three different real-life applications (i.e. Image Classification, Keyword Spotting, Facial Expression Recognition).

[1]  Vivienne Sze,et al.  Efficient Processing of Deep Neural Networks: A Tutorial and Survey , 2017, Proceedings of the IEEE.

[2]  Andrea Calimera,et al.  Layer-Wise Compressive Training for Convolutional Neural Networks , 2018, Future Internet.

[3]  Tara N. Sainath,et al.  Convolutional neural networks for small-footprint keyword spotting , 2015, INTERSPEECH.

[4]  Luca Benini,et al.  PULP: A Ultra-Low Power Parallel Accelerator for Energy-Efficient and Flexible Embedded Vision , 2015, Journal of Signal Processing Systems.

[5]  Frédéric Pétrot,et al.  Ternary neural networks for resource-efficient AI applications , 2016, 2017 International Joint Conference on Neural Networks (IJCNN).

[6]  Weisong Shi,et al.  Edge Computing: Vision and Challenges , 2016, IEEE Internet of Things Journal.

[7]  Magnus Själander,et al.  BISMO: A Scalable Bit-Serial Matrix Multiplication Overlay for Reconfigurable Computing , 2018, 2018 28th International Conference on Field Programmable Logic and Applications (FPL).

[8]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[9]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[10]  Vikas Chandra,et al.  CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs , 2018, ArXiv.

[11]  Luca Benini,et al.  Quantized NNs as the definitive solution for inference on low-power ARM MCUs?: work-in-progress , 2018, CODES+ISSS.

[12]  Pete Warden,et al.  Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition , 2018, ArXiv.

[13]  Song Han,et al.  Exploring the Regularity of Sparse Structure in Convolutional Neural Networks , 2017, ArXiv.

[14]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[15]  Hanan Samet,et al.  Pruning Filters for Efficient ConvNets , 2016, ICLR.

[16]  Scott A. Mahlke,et al.  Scalpel: Customizing DNN pruning to the underlying hardware parallelism , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[17]  Yu Wang,et al.  Going Deeper with Embedded FPGA Platform for Convolutional Neural Network , 2016, FPGA.

[18]  Valentino Peluso,et al.  Scalable-Effort ConvNets for Multilevel Classification , 2018, 2018 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[19]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[20]  Marian Verhelst,et al.  Energy-efficient ConvNets through approximate computing , 2016, 2016 IEEE Winter Conference on Applications of Computer Vision (WACV).

[21]  Soheil Ghiasi,et al.  Hardware-oriented Approximation of Convolutional Neural Networks , 2016, ArXiv.

[22]  Bo Chen,et al.  NetAdapt: Platform-Aware Neural Network Adaptation for Mobile Applications , 2018, ECCV.

[23]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[24]  Daisuke Miyashita,et al.  LogNet: Energy-efficient neural networks using logarithmic computation , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[25]  Igor Carron,et al.  XNOR-Net: ImageNet Classification Using Binary Convolutional Neural Networks , 2016 .