OWSNet: Towards Real-time Offensive Words Spotting Network for Consumer IoT Devices

Every modern household owns at least a dozen of IoT devices like smart speakers, video doorbells, smartwatches, where most of them are equipped with a Keyword spotting (KWS) system-based digital voice assistant like Alexa. The state-of-the-art KWS systems require a large number of operations, higher computation, memory resources to show top performance. In this paper, in contrast to existing resource-demanding KWS systems, we propose a light-weight temporal convolution based KWS system named OWSNet, that can comfortably execute on a variety of IoT devices around us and can accurately spot multiple keywords in real-time without disturbing the device’s routine functionalities. When OWSNet is deployed on consumer IoT devices placed in the workplace, home, etc., in addition to spotting wake/trigger words like ‘Hey Siri’, ‘Alexa’, it can also accurately spot offensive words in real-time. If regular wake words are spotted, it activates the voice assistant; else if offensive words are spotted, it starts to capture and stream audio data to speech analytics APIs for autonomous threat and insecurities detection in the scene. The evaluation results show that the OWSNet is faster than state-of-the-art models as it produced $\approx$ 1-74 times faster inference on Raspberry Pi 4 and $\approx$ 1-12 times faster inference on NVIDIA Jetson Nano. In this paper, to optimize IoT use-case models like OWSNet, we present a generic multi-component ML model optimization sequence that can reduce the memory and computation demands of a wide range of ML models thus enabling their execution on low resource, cost, power IoT devices.

[1]  J. Breslin,et al.  Demo Abstract: Porting and Execution of Anomalies Detection Models on Embedded Systems in IoT , 2021 .

[2]  Muhammad Intizar Ali,et al.  Adaptive Strategy to Improve the Quality of Communication for IoT Edge Devices , 2020, 2020 IEEE 6th World Forum on Internet of Things (WF-IoT).

[3]  Peter Corcoran,et al.  Smart Speaker Design and Implementation with Biometric Authentication and Advanced Voice Interaction Capability , 2022, AICS.

[4]  Saurabh Goyal,et al.  Resource-efficient Machine Learning in 2 KB RAM for the Internet of Things , 2017, ICML.

[5]  Vikas Chandra,et al.  CMSIS-NN: Efficient Neural Network Kernels for Arm Cortex-M CPUs , 2018, ArXiv.

[6]  B. Sudharsan Machine Learning Meets Internet of Things: From Theory to Practice , 2021 .

[7]  Song Han,et al.  Deep Compression: Compressing Deep Neural Network with Pruning, Trained Quantization and Huffman Coding , 2015, ICLR.

[8]  Jimmy J. Lin,et al.  Deep Residual Learning for Small-Footprint Keyword Spotting , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Bharath Sudharsan,et al.  AI Vision: Smart speaker design and implementation with object detection custom skill and advanced voice interaction capability , 2019, 2019 11th International Conference on Advanced Computing (ICoAC).

[10]  Muhammad Intizar Ali,et al.  Edge2Train: a framework to train machine learning models (SVMs) on resource-constrained IoT edge devices , 2020, IOT.

[11]  John G. Breslin,et al.  Ultra-fast Machine Learning Classifier Execution on IoT Devices without SRAM Consumption , 2021, 2021 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops).

[12]  Soheil Ghiasi,et al.  Hardware-oriented Approximation of Convolutional Neural Networks , 2016, ArXiv.

[13]  Muhammad Intizar Ali,et al.  RCE-NN: a five-stage pipeline to execute neural networks (CNNs) on resource-constrained IoT edge devices , 2020, IOT.

[14]  Dan Alistarh,et al.  Model compression via distillation and quantization , 2018, ICLR.

[15]  John G. Breslin,et al.  SRAM optimized porting and execution of machine learning classifiers on MCU-based IoT devices: demo abstract , 2021, ICCPS.

[16]  Muhammad Intizar Ali,et al.  Avoid Touching Your Face: A Hand-to-face 3D Motion Dataset (COVID-away) and Trained Models for Smartwatches , 2020, IOT Companion.

[17]  Nicholas D. Lane,et al.  Sparsification and Separation of Deep Learning Layers for Constrained Resource Inference on Wearables , 2016, SenSys.

[18]  Yundong Zhang,et al.  Hello Edge: Keyword Spotting on Microcontrollers , 2017, ArXiv.

[19]  John G. Breslin,et al.  Edge2Guard: Botnet Attacks Detecting Offline Models for Resource-Constrained IoT Devices , 2021, 2021 IEEE International Conference on Pervasive Computing and Communications Workshops and other Affiliated Events (PerCom Workshops).

[20]  Michael Garland,et al.  A Programmable Approach to Model Compression , 2019, ArXiv.

[21]  Mark Sandler,et al.  Convolutional recurrent neural networks for music classification , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[22]  Prateek Jain,et al.  ProtoNN: Compressed and Accurate kNN for Resource-scarce Devices , 2017, ICML.

[23]  Kyogu Lee,et al.  Rare Sound Event Detection Using 1D Convolutional Recurrent Neural Networks , 2017, DCASE.

[24]  Bo Chen,et al.  Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.