论文信息 - An SRAM Optimized Approach for Constant Memory Consumption and Ultra-fast Execution of ML Classifiers on TinyML Hardware

An SRAM Optimized Approach for Constant Memory Consumption and Ultra-fast Execution of ML Classifiers on TinyML Hardware

With the introduction of ultra-low-power machine learning (TinyML), IoT devices are becoming smarter as they are driven by Machine Learning (ML) models. However, any increase in the training data results in a linear increase in the space complexity of the ML models. It is highly challenging to deploy such ML models on IoT devices with limited memory (TinyML hardware). To alleviate such memory issues, in this paper, we present an SRAM-optimized classifier porting, stitching, and efficient deployment approach. The proposed method enables large classifiers to be comfortably executed on microcontroller unit (MCU) based IoT devices and perform ultra-fast classifications while consuming 0 bytes of SRAM. We tested our SRAM optimized approach by utilizing it to port and execute 7 dataset-trained classifiers on 7 popular MCU boards, and report their inference time and memory (Flash and SRAM) consumption. It is apparent from the experimental results that; (i) the classifiers ported using our proposed approach are of varied sizes but have constant SRAM consumption. Thus, the approach enabled the deployment of larger ML classifier models even on tiny Atmega328P MCU-based Arduino Nano, which has only 8 kB SRAM; (ii) even the resource-constrained 8-bit MCUs performed faster unit inference (in less than a millisecond) than a NVIDIA Jetson Nano GPU and Raspberry Pi 4 CPU; (iii) the majority of models produced 1-4x times faster inference results in comparison with the models ported by the sklearn-porter, m2cgen, and emlearn libraries.

[1] Vikas Chandra,et al. CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs , 2018, ArXiv.

[2] John G. Breslin,et al. Air Quality Sensor Network Data Acquisition, Cleaning, Visualization, and Analytics: A Real-world IoT Use Case , 2021, UbiComp/ISWC Adjunct.

[3] Mohak Shah,et al. On-Device Machine Learning: An Algorithms and Learning Theory Perspective , 2019, ArXiv.

[4] John G. Breslin,et al. ML-MCU: A Framework to Train ML Classifiers on MCU-Based IoT Edge Devices , 2021, IEEE Internet of Things Journal.

[5] K. Haigh,et al. Machine Learning for Embedded Systems : A Case Study , 2015 .

[6] John G. Breslin,et al. OWSNet: Towards Real-time Offensive Words Spotting Network for Consumer IoT Devices , 2021, 2021 IEEE 7th World Forum on Internet of Things (WF-IoT).

[7] Darko Anicic,et al. TinyOL: TinyML with Online-Learning on Microcontrollers , 2021, 2021 International Joint Conference on Neural Networks (IJCNN).

[8] Wei-Yin Loh,et al. A Comparison of Prediction Accuracy, Complexity, and Training Time of Thirty-Three Old and New Classification Algorithms , 2000, Machine Learning.

[9] Peter Corcoran,et al. Smart Speaker Design and Implementation with Biometric Authentication and Advanced Voice Interaction Capability , 2022, AICS.

[10] Saurabh Goyal,et al. Resource-efficient Machine Learning in 2 KB RAM for the Internet of Things , 2017, ICML.

[11] J. Andrew Bagnell,et al. SpeedBoost: Anytime Prediction with Uniform Near-Optimality , 2012, AISTATS.

[12] Venkatesh Saligrama,et al. Pruning Random Forests for Prediction on a Budget , 2016, NIPS.

[13] John G. Breslin,et al. Edge2Guard: Botnet Attacks Detecting Offline Models for Resource-Constrained IoT Devices , 2021, 2021 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops).

[14] Andreas Spanias,et al. Integrating machine learning in embedded sensor systems for Internet-of-Things applications , 2016, 2016 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT).

[15] Michael Garland,et al. A Programmable Approach to Model Compression , 2019, ArXiv.

[16] B. Sudharsan. Machine Learning Meets Internet of Things: From Theory to Practice , 2021 .

[17] Prem Prakash Jayaraman,et al. Toward Distributed, Global, Deep Learning Using IoT Devices , 2021, IEEE Internet Computing.

[18] Luca Benini,et al. FANN-on-MCU: An Open-Source Toolkit for Energy-Efficient Neural Network Inference at the Edge of the Internet of Things , 2020, IEEE Internet of Things Journal.

[19] John G. Breslin,et al. ElastiCL: Elastic Quantization for Communication Efficient Collaborative Learning in IoT , 2021, SenSys.

[20] Luca Benini,et al. PULP: A parallel ultra low power platform for next generation IoT applications , 2015, 2015 IEEE Hot Chips 27 Symposium (HCS).

[21] Bharath Sudharsan,et al. AI Vision: Smart speaker design and implementation with object detection custom skill and advanced voice interaction capability , 2019, 2019 11th International Conference on Advanced Computing (ICoAC).

[22] Goutham Kamath,et al. Pushing Analytics to the Edge , 2016, 2016 IEEE Global Communications Conference (GLOBECOM).

[23] Muhammad Intizar Ali,et al. Edge2Train: a framework to train machine learning models (SVMs) on resource-constrained IoT edge devices , 2020, IOT.

[24] Nicholas D. Lane,et al. Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables , 2016, SenSys.

[25] Sebastian Nowozin,et al. Decision Jungles: Compact and Rich Models for Classification , 2013, NIPS.

[26] Ben Y. Zhao,et al. Complexity vs. performance: empirical analysis of machine learning as a service , 2017, Internet Measurement Conference.

[27] P. K. Sinha,et al. Pruning of Random Forest classifiers: A survey and future directions , 2012, 2012 International Conference on Data Science & Engineering (ICDSE).

[28] John G. Breslin,et al. Train++: An Incremental ML Model Training Algorithm to Create Self-Learning IoT Devices , 2021, 2021 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/IOP/SCI).

[29] Prateek Jain,et al. ProtoNN: Compressed and Accurate kNN for Resource-scarce Devices , 2017, ICML.

[30] Muhammad Intizar Ali,et al. Adaptive Strategy to Improve the Quality of Communication for IoT Edge Devices , 2020, 2020 IEEE 6th World Forum on Internet of Things (WF-IoT).

[31] Gustavo E. A. P. A. Batista,et al. EmbML Tool: Supporting the use of Supervised Learning Algorithms in Low-Cost Embedded Systems , 2019, 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI).

[32] Dan Alistarh,et al. Model compression via distillation and quantization , 2018, ICLR.

[33] Soheil Ghiasi,et al. Hardware-oriented Approximation of Convolutional Neural Networks , 2016, ArXiv.

[34] Pinto Rafael,et al. Breast Cancer Dataset , 2015 .

[35] Matthew Mattina,et al. MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers , 2020, MLSys.

[36] Song Han,et al. Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[37] John G. Breslin,et al. Enabling Machine Learning on the Edge Using SRAM Conserving Efficient Neural Networks Execution Approach , 2021, ECML/PKDD.

[38] Muhammad Intizar Ali,et al. Avoid Touching Your Face: A Hand-to-face 3D Motion Dataset (COVID-away) and Trained Models for Smartwatches , 2020, IOT Companion.

[39] Luca Benini,et al. CMix-NN: Mixed Low-Precision CNN Library for Memory-Constrained Edge Devices , 2020, IEEE Transactions on Circuits and Systems II: Express Briefs.