Work-in-Progress: Quantized NNs as the Definitive Solution for Inference on Low-Power ARM MCUs?

High energy efficiency and low memory footprint are the key requirements for the deployment of deep learning based analytics on low-power microcontrollers. Here we present work-in-progress results with <tex>$Q$</tex>-bit Quantized Neural Networks (QNNs) deployed on a commercial Cortex-M7 class microcontroller by means of an extension to the ARM CMSIS-NN library. We show that i) for <tex>$Q=4$</tex> and <tex>$Q=2$</tex> low memory footprint QNNs can be deployed with an energy overhead of 30% and 36% respectively against the 8-bit CMSIS-NN due to the lack of quantization support in the ISA; ii) for <tex>$Q=1$</tex> native instructions can be used, yielding an energy and latency reduction of ∼3.8× with respect to CMSIS-NN. Our initial results suggest that a small set of QNN-related specialized instructions could improve performance by as much as 7.5× for <tex>$Q=4$</tex>, 13.6× for <tex>$Q=2$</tex> and 6.5× for binary NNs.

[1]  Ran El-Yaniv,et al.  Quantized Neural Networks: Training Neural Networks with Low Precision Weights and Activations , 2016, J. Mach. Learn. Res..

[2]  Marian Verhelst,et al.  Minimum energy quantized neural networks , 2017, 2017 51st Asilomar Conference on Signals, Systems, and Computers.

[3]  Luca Benini,et al.  Near-Threshold RISC-V Core With DSP Extensions for Scalable IoT Endpoint Devices , 2016, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[4]  Luca Benini,et al.  Design Automation for Binarized Neural Networks: A Quantum Leap Opportunity? , 2017, 2018 IEEE International Symposium on Circuits and Systems (ISCAS).

[5]  Vikas Chandra,et al.  CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs , 2018, ArXiv.

[6]  Thomas B. Preußer,et al.  Inference of quantized neural networks on heterogeneous all-programmable devices , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).