MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers

Executing machine learning workloads locally on resource constrained microcontrollers (MCUs) promises to drastically expand the application space of IoT. However, so-called TinyML presents severe technical challenges, as deep neural network inference demands a large compute and memory budget. To address this challenge, neural architecture search (NAS) promises to help design accurate ML models that meet the tight MCU memory, latency and energy constraints. A key component of NAS algorithms is their latency/energy model, i.e., the mapping from a given neural network architecture to its inference latency/energy on an MCU. In this paper, we observe an intriguing property of NAS search spaces for MCU model design: on average, model latency varies linearly with model operation (op) count under a uniform prior over models in the search space. Exploiting this insight, we employ differentiable NAS (DNAS) to search for models with low memory usage and low op count, where op count is treated as a viable proxy to latency. Experimental results validate our methodology, yielding our MicroNet models, which we deploy on MCUs using Tensorflow Lite Micro, a standard open-source NN inference runtime widely used in the TinyML community. MicroNets demonstrate state-of-the-art results for all three TinyMLperf industry-standard benchmark tasks: visual wake words, audio keyword spotting, and anomaly detection.

[1]  V. Reddi,et al.  TensorFlow Lite Micro: Embedded Machine Learning on TinyML Systems , 2020, MLSys.

[2]  Di Niu,et al.  Neural Architecture Search For Keyword Spotting , 2020, INTERSPEECH.

[3]  Luca Benini,et al.  Leveraging Automated Mixed-Low-Precision Quantization for Tiny Edge Microcontrollers , 2020, IoT Streams/ITEM@PKDD/ECML.

[4]  Eunhyeok Park,et al.  PROFIT: A Novel Training Method for sub-4-bit MobileNet Models , 2020, ECCV.

[5]  A. Wong,et al.  TinySpeech: Attention Condensers for Deep Speech Recognition Neural Networks on Edge Devices , 2020, ArXiv.

[6]  Song Han,et al.  MCUNet: Tiny Deep Learning on IoT Devices , 2020, NeurIPS.

[7]  Paulo Cortez,et al.  Deep Dense and Convolutional Autoencoders for Unsupervised Anomaly Detection in Machine Condition Sounds , 2020, ArXiv.

[8]  Matthew Mattina,et al.  TinyLSTMs: Efficient Neural Speech Enhancement for Hearing Aids , 2020, INTERSPEECH.

[9]  Yuandong Tian,et al.  FBNetV2: Differentiable Neural Architecture Search for Spatial and Channel Dimensions , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Luca Benini,et al.  CMix-NN: Mixed Low-Precision CNN Library for Memory-Constrained Edge Devices , 2020, IEEE Transactions on Circuits and Systems II: Express Briefs.

[11]  David Patterson,et al.  Benchmarking TinyML Systems: Challenges and Direction , 2020, ArXiv.

[12]  Matthew Mattina,et al.  Ternary MobileNets via Per-Layer Hybrid Filter Banks , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[13]  Steven K. Esser,et al.  Learned Step Size Quantization , 2019, ICLR.

[14]  A. Krishnaswamy,et al.  UNSUPERVISED ANOMALOUS SOUND DETECTION USING SELF-SUPERVISED CLASSIFICATION AND GROUP MASKED AUTOENCODER FOR DENSITY ESTIMATION Technical , 2020 .

[15]  Shouyi Yin,et al.  Small-Footprint Keyword Spotting with Graph Convolutional Network , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[16]  Yukun Yang,et al.  MSNet: Structural Wired Neural Architecture Search for Internet of Things , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[17]  Yohei Kawaguchi,et al.  MIMII Dataset: Sound Dataset for Malfunctioning Industrial Machine Investigation and Inspection , 2019, DCASE.

[18]  David Gregg,et al.  Performance-Oriented Neural Architecture Search , 2019, 2019 International Conference on High Performance Computing & Simulation (HPCS).

[19]  Dawn Song,et al.  Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty , 2019, NeurIPS.

[20]  Aakanksha Chowdhery,et al.  Visual Wake Words Dataset , 2019, ArXiv.

[21]  Matthew Mattina,et al.  Compressing RNNs for IoT devices by 15-38x using Kronecker Products , 2019, ArXiv.

[22]  Ryan P. Adams,et al.  SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers , 2019, NeurIPS.

[23]  Boris Murmann,et al.  Memory-Optimal Direct Convolutions for Maximizing Classification Accuracy in Embedded Applications , 2019, ICML.

[24]  Yi Yang,et al.  Network Pruning via Transformable Architecture Search , 2019, NeurIPS.

[25]  Dongyoung Kim,et al.  Temporal Convolution for Real-time Keyword Spotting on Mobile Devices , 2019, INTERSPEECH.

[26]  Matthew Mattina,et al.  Ternary Hybrid Neural-Tree Networks for Highly Constrained IoT Applications , 2019, MLSys.

[27]  Patrick Hansen,et al.  FixyNN: Efficient Hardware for Mobile Computer Vision via Transfer Learning , 2019, ArXiv.

[28]  Daniel Soudry,et al.  Post training 4-bit quantization of convolutional networks for rapid-deployment , 2018, NeurIPS.

[29]  Song Han,et al.  ProxylessNAS: Direct Neural Architecture Search on Target Task and Hardware , 2018, ICLR.

[30]  Frank Hutter,et al.  Neural Architecture Search: A Survey , 2018, J. Mach. Learn. Res..

[31]  Yiming Yang,et al.  DARTS: Differentiable Architecture Search , 2018, ICLR.

[32]  Prateek Jain,et al.  FastGRNN: A Fast, Accurate, Stable and Tiny Kilobyte Sized Gated Recurrent Neural Network , 2018, NeurIPS.

[33]  Raghuraman Krishnamoorthi,et al.  Quantizing deep convolutional networks for efficient inference: A whitepaper , 2018, ArXiv.

[34]  Pete Warden,et al.  Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition , 2018, ArXiv.

[35]  Vikas Chandra,et al.  CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs , 2018, ArXiv.

[36]  Mark Sandler,et al.  MobileNetV2: Inverted Residuals and Linear Bottlenecks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Vikas Chandra,et al.  Not All Ops Are Created Equal! , 2018, ArXiv.

[38]  Hongyi Zhang,et al.  mixup: Beyond Empirical Risk Minimization , 2017, ICLR.

[39]  Yundong Zhang,et al.  Hello Edge: Keyword Spotting on Microcontrollers , 2017, ArXiv.

[40]  Saurabh Goyal,et al.  Resource-efficient Machine Learning in 2 KB RAM for the Internet of Things , 2017, ICML.

[41]  Prateek Jain,et al.  ProtoNN: Compressed and Accurate kNN for Resource-scarce Devices , 2017, ICML.

[42]  Sandip Parikh,et al.  High performance DSP for vision, imaging and neural networks , 2016, 2016 IEEE Hot Chips 28 Symposium (HCS).

[43]  И. И. Балаш Противоугонное устройство с радиочастотной идентификацией на базе микроконтроллера семейства ARM CORTEX-M4 , 2016 .

[44]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[45]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[46]  David A. Patterson,et al.  Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .