论文信息 - NS-FDN: Near-Sensor Processing Architecture of Feature-Configurable Distributed Network for Beyond-Real-Time Always-on Keyword Spotting

NS-FDN: Near-Sensor Processing Architecture of Feature-Configurable Distributed Network for Beyond-Real-Time Always-on Keyword Spotting

Always-on keyword spotting (KWS) that detects wake-up words has been the indispensable module in the voice interaction system. However, the ultra-low-power embedded devices put forward strict requirements on energy consumption, latency, and recognition accuracy of KWS. In this work, we propose a near-sensor processing architecture of feature-configurable distributed network (NS-FDN) for always-on KWS applications. The proposed distributed network adapts to the flexible keywords demands in the actual scene by splitting the conventional single network into distributed sub-networks. We design a channel-independent training framework to improve the recognition accuracy of distributed networks. The speech features are evaluated and the redundancy is reduced in NS-FDN, which can also configure the speech features to further reduce the computing complexity and improve processing speed. For deeper optimization, we implement a 65nm-process prototype chip with near-sensor mixed-signal processing architecture avoiding energy-consuming analog-to-digital converter. By improving the system, algorithm, and hardware designs of the KWS, our co-optimized architecture eliminates the energy consumption bottleneck long-standing in conventional KWS systems and achieves state-of-the-art system performance. The experiment results show that NS-FDN achieves 31.6% energy consumption savings, 1.6 times memory savings, 57 times speedup, and 3.4% higher recognition accuracy compared with the state of the art.

[1] Anantha P. Chandrakasan,et al. A Low-Power Speech Recognizer and Voice Activity Detector Using Deep Neural Networks , 2018, IEEE Journal of Solid-State Circuits.

[2] Pete Warden,et al. Speech Commands: A Dataset for Limited-Vocabulary Speech Recognition , 2018, ArXiv.

[3] Marian Verhelst,et al. Laika: A 5uW Programmable LSTM Accelerator for Always-on Keyword Spotting in 65nm CMOS , 2018, ESSCIRC 2018 - IEEE 44th European Solid State Circuits Conference (ESSCIRC).

[4] Léon Bottou,et al. Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[5] Aurel A. Lazar,et al. Design of an Always-On Deep Neural Network-Based 1- $\mu$ W Voice Activity Detector Aided With a Customized Software Model for Analog Feature Extraction , 2019, IEEE Journal of Solid-State Circuits.

[6] Natalia Gimelshein,et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library , 2019, NeurIPS.

[7] David Blaauw,et al. 14.7 A 288µW programmable deep-learning processor with 270KB on-chip weight storage using non-uniform memory hierarchy for mobile intelligence , 2017, 2017 IEEE International Solid-State Circuits Conference (ISSCC).

[8] Stephen P. Boyd,et al. Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..

[9] Yifan Gong,et al. RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition , 2020, 2020 57th ACM/IEEE Design Automation Conference (DAC).

[10] Hongyang Jia,et al. In-Memory Computing: Advances and prospects , 2019, IEEE Solid-State Circuits Magazine.

[11] Tara N. Sainath,et al. Query-by-example keyword spotting using long short-term memory networks , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12] Qi Wei,et al. MSP-MFCC: Energy-Efficient MFCC Feature Extraction Method With Mixed-Signal Processing Architecture for Wearable Speech Recognition Applications , 2020, IEEE Access.

[13] Tadashi Shibata,et al. An On-Chip-Trainable Gaussian-Kernel Analog Support Vector Machine , 2010, IEEE Transactions on Circuits and Systems I: Regular Papers.

[14] Esther Rodríguez-Villegas,et al. A Subhertz Nanopower Low-Pass Filter , 2011, IEEE Transactions on Circuits and Systems II: Express Briefs.

[15] Hugo Van hamme,et al. 18μW SoC for near-microphone Keyword Spotting and Speaker Verification , 2019, 2019 Symposium on VLSI Circuits.

[16] David Blaauw,et al. 17.2 A 142nW Voice and Acoustic Activity Detection Chip for mm-Scale Sensor Nodes Using Time-Interleaved Mixer-Based Frequency Scanning , 2019, 2019 IEEE International Solid- State Circuits Conference - (ISSCC).

[17] Hongwei Ding,et al. OCEAN: An on-chip incremental-learning enhanced processor with gated recurrent neural network accelerators , 2017, ESSCIRC 2017 - 43rd IEEE European Solid State Circuits Conference.

[18] Kobchai Dejhan,et al. Versatile analog squarer and multiplier free from body effect , 2012 .

[19] Yundong Zhang,et al. Hello Edge: Keyword Spotting on Microcontrollers , 2017, ArXiv.

[20] Leibo Liu,et al. A 141 UW, 2.46 PJ/Neuron Binarized Convolutional Neural Network Based Self-Learning Speech Recognition Processor in 28NM CMOS , 2018, 2018 IEEE Symposium on VLSI Circuits.

[21] Huichan Zhao,et al. NS-CIM: A Current-Mode Computation-in-Memory Architecture Enabling Near-Sensor Processing for Intelligent IoT Vision Nodes , 2020, IEEE Transactions on Circuits and Systems I: Regular Papers.

[22] Fei Qiao,et al. NS-KWS: joint optimization of near-sensor processing architecture and low-precision GRU for always-on keyword spotting , 2020, ISLPED.

[23] Huazhong Yang,et al. Optimization and Evaluation of Energy-Efficient Mixed-Signal MFCC Feature Extraction Architecture , 2020, 2020 IEEE Computer Society Annual Symposium on VLSI (ISVLSI).

[24] Andreas Wiesbauer,et al. A Coarse-Fine VCO-ADC for MEMS Microphones With Sampling Synchronization by Data Scrambling , 2020, IEEE Solid-State Circuits Letters.

[25] Marian Verhelst,et al. A switched-capacitor degenerated, scalable gm-C filter-bank for acoustic front-ends , 2016, 2016 IEEE International Symposium on Circuits and Systems (ISCAS).

[26] Jimmy J. Lin,et al. An Experimental Analysis of the Power Consumption of Convolutional Neural Networks for Keyword Spotting , 2017, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).