NS-FDN: Near-Sensor Processing Architecture of Feature-Configurable Distributed Network for Beyond-Real-Time Always-on Keyword Spotting

Always-on keyword spotting (KWS) that detects wake-up words has been the indispensable module in the voice interaction system. However, the ultra-low-power embedded devices put forward strict requirements on energy consumption, latency, and recognition accuracy of KWS. In this work, we propose a near-sensor processing architecture of feature-configurable distributed network (NS-FDN) for always-on KWS applications. The proposed distributed network adapts to the flexible keywords demands in the actual scene by splitting the conventional single network into distributed sub-networks. We design a channel-independent training framework to improve the recognition accuracy of distributed networks. The speech features are evaluated and the redundancy is reduced in NS-FDN, which can also configure the speech features to further reduce the computing complexity and improve processing speed. For deeper optimization, we implement a 65nm-process prototype chip with near-sensor mixed-signal processing architecture avoiding energy-consuming analog-to-digital converter. By improving the system, algorithm, and hardware designs of the KWS, our co-optimized architecture eliminates the energy consumption bottleneck long-standing in conventional KWS systems and achieves state-of-the-art system performance. The experiment results show that NS-FDN achieves 31.6% energy consumption savings, 1.6 times memory savings, 57 times speedup, and 3.4% higher recognition accuracy compared with the state of the art.

[1]  Anantha P. Chandrakasan,et al.  A Low-Power Speech Recognizer and Voice Activity Detector Using Deep Neural Networks , 2018, IEEE Journal of Solid-State Circuits.

[2]  Pete Warden,et al.  Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition , 2018, ArXiv.

[3]  Marian Verhelst,et al.  Laika: A 5uW Programmable LSTM Accelerator for Always-on Keyword Spotting in 65nm CMOS , 2018, ESSCIRC 2018 - IEEE 44th European Solid State Circuits Conference (ESSCIRC).

[4]  Léon Bottou,et al.  Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[5]  Aurel A. Lazar,et al.  Design of an Always-On Deep Neural Network-Based 1- $\mu$ W Voice Activity Detector Aided With a Customized Software Model for Analog Feature Extraction , 2019, IEEE Journal of Solid-State Circuits.

[6]  Natalia Gimelshein,et al.  PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[7]  David Blaauw,et al.  14.7 A 288µW programmable deep-learning processor with 270KB on-chip weight storage using non-uniform memory hierarchy for mobile intelligence , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[8]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[9]  Yifan Gong,et al.  RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[10]  Hongyang Jia,et al.  In-Memory Computing: Advances and prospects , 2019, IEEE Solid-State Circuits Magazine.

[11]  Tara N. Sainath,et al.  Query-by-example keyword spotting using long short-term memory networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  Qi Wei,et al.  MSP-MFCC: Energy-Efficient MFCC Feature Extraction Method With Mixed-Signal Processing Architecture for Wearable Speech Recognition Applications , 2020, IEEE Access.

[13]  Tadashi Shibata,et al.  An On-Chip-Trainable Gaussian-Kernel Analog Support Vector Machine , 2010, IEEE Transactions on Circuits and Systems I: Regular Papers.

[14]  Esther Rodríguez-Villegas,et al.  A Subhertz Nanopower Low-Pass Filter , 2011, IEEE Transactions on Circuits and Systems II: Express Briefs.

[15]  Hugo Van hamme,et al.  18μW SoC for near-microphone Keyword Spotting and Speaker Verification , 2019, 2019 Symposium on VLSI Circuits.

[16]  David Blaauw,et al.  17.2 A 142nW Voice and Acoustic Activity Detection Chip for mm-Scale Sensor Nodes Using Time-Interleaved Mixer-Based Frequency Scanning , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).

[17]  Hongwei Ding,et al.  OCEAN: An on-chip incremental-learning enhanced processor with gated recurrent neural network accelerators , 2017, ESSCIRC 2017 - 43rd IEEE European Solid State Circuits Conference.

[18]  Kobchai Dejhan,et al.  Versatile analog squarer and multiplier free from body effect , 2012 .

[19]  Yundong Zhang,et al.  Hello Edge: Keyword Spotting on Microcontrollers , 2017, ArXiv.

[20]  Leibo Liu,et al.  A 141 UW, 2.46 PJ/Neuron Binarized Convolutional Neural Network Based Self-Learning Speech Recognition Processor in 28NM CMOS , 2018, 2018 IEEE Symposium on VLSI Circuits.

[21]  Huichan Zhao,et al.  NS-CIM: A Current-Mode Computation-in-Memory Architecture Enabling Near-Sensor Processing for Intelligent IoT Vision Nodes , 2020, IEEE Transactions on Circuits and Systems I: Regular Papers.

[22]  Fei Qiao,et al.  NS-KWS: joint optimization of near-sensor processing architecture and low-precision GRU for always-on keyword spotting , 2020, ISLPED.

[23]  Huazhong Yang,et al.  Optimization and Evaluation of Energy-Efficient Mixed-Signal MFCC Feature Extraction Architecture , 2020, 2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[24]  Andreas Wiesbauer,et al.  A Coarse-Fine VCO-ADC for MEMS Microphones With Sampling Synchronization by Data Scrambling , 2020, IEEE Solid-State Circuits Letters.

[25]  Marian Verhelst,et al.  A switched-capacitor degenerated, scalable gm-C filter-bank for acoustic front-ends , 2016, 2016 IEEE International Symposium on Circuits and Systems (ISCAS).

[26]  Jimmy J. Lin,et al.  An Experimental Analysis of the Power Consumption of Convolutional Neural Networks for Keyword Spotting , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).