Enhancing Voice Wake-Up for Dysarthria: Mandarin Dysarthria Speech Corpus Release and Customized System Design

Smart home technology has gained widespread adoption, facilitating effortless control of devices through voice commands. However, individuals with dysarthria, a motor speech disorder, face challenges due to the variability of their speech. This paper addresses the wake-up word spotting (WWS) task for dysarthric individuals, aiming to integrate them into real-world applications. To support this, we release the open-source Mandarin Dysarthria Speech Corpus (MDSC), a dataset designed for dysarthric individuals in home environments. MDSC encompasses information on age, gender, disease types, and intelligibility evaluations. Furthermore, we perform comprehensive experimental analysis on MDSC, highlighting the challenges encountered. We also develop a customized dysarthria WWS system that showcases robustness in handling intelligibility and achieving exceptional performance. MDSC will be released on https://www.aishelltech.com/AISHELL_6B.

[1]  Xinchen Kang,et al.  CDSD: Chinese Dysarthria Speech Database , 2023, arXiv.org.

[2]  Kanthashree Mysore Sathyendra,et al.  Dual-Attention Neural Transducers for Efficient Wake Word Spotting in Speech Recognition , 2023, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Qiang Fu,et al.  The DKU Post-Challenge Audio-Visual Wake Word Spotting System for the 2021 MISP Challenge: Deep Analysis , 2023, ICASSP 2023 - 2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  Chaoyue Ding,et al.  Speed-Robust Keyword Spotting Via Soft Self-Attention on Multi-Scale Features , 2023, 2022 IEEE Spoken Language Technology Workshop (SLT).

[5]  N. Yan,et al.  Audio-video database from subacute stroke patients for dysarthric speech intelligence assessment and preliminary analysis , 2023, Biomed. Signal Process. Control..

[6]  Jingyong Hou,et al.  Wekws: A Production First Small-Footprint End-to-End Keyword Spotting Toolkit , 2022, IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  I. Mcloughlin,et al.  Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition , 2022, INTERSPEECH.

[8]  Sabato Marco Siniscalchi,et al.  The First Multimodal Information Based Speech Processing (Misp) Challenge: Data, Tasks, Baselines And Results , 2022, ICASSP 2022 - 2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Julie Cattiau,et al.  Disordered Speech Data Collection: Lessons Learned at 1 Million Utterances from Project Euphonia , 2021, Interspeech.

[10]  Luciano Fadiga,et al.  EasyCall corpus: a dysarthric speech dataset , 2021, Interspeech.

[11]  Lei Xie,et al.  WeNet: Production Oriented Streaming and Non-Streaming End-to-End Speech Recognition Toolkit , 2021, Interspeech.

[12]  Luca Fanucci,et al.  IDEA: An Italian Dysarthric Speech Database , 2021, 2021 IEEE Spoken Language Technology Workshop (SLT).

[13]  Achintya Kumar Sarkar,et al.  Data Augmentation Enhanced Speaker Enrollment for Text-dependent Speaker Verification , 2020, 2020 3rd International Conference on Energy, Power and Environment: Towards Clean Energy Technologies.

[14]  Yixin Gao,et al.  Towards Data-Efficient Modeling for Wake Word Spotting , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Mei-Yuh Hwang,et al.  Mining Effective Negative Training Samples for Keyword Spotting , 2020, ICASSP 2020 - 2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[16]  Ivana Šehović,et al.  Characteristics of Speech and Voice as Predictors of the Quality of Communication in Adults with Hypokinetic Dysarthria , 2019, Serbian Journal of Experimental and Clinical Research.

[17]  Mathieu Poumeyrol,et al.  Efficient Keyword Spotting Using Dilated Convolutions and Gating , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[18]  Hui Bu,et al.  AISHELL-2: Transforming Mandarin ASR Research Into Industrial Scale , 2018, ArXiv.

[19]  Lei Xie,et al.  Attention-based End-to-End Models for Small-Footprint Keyword Spotting , 2018, INTERSPEECH.

[20]  Hao Zheng,et al.  AISHELL-1: An open-source Mandarin speech corpus and a speech recognition baseline , 2017, 2017 20th Conference of the Oriental Chapter of the International Coordinating Committee on Speech Databases and Speech I/O Systems and Assessment (O-COCOSDA).

[21]  Nikko Strom,et al.  Max-pooling loss training of long short-term memory networks for small-footprint keyword spotting , 2016, 2016 IEEE Spoken Language Technology Workshop (SLT).

[22]  Thomas Fang Zheng,et al.  Improving Short Utterance Speaker Recognition by Modeling Speech Unit Classes , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[23]  Frank Rudzicz,et al.  The TORGO database of acoustic and articulatory speech from speakers with dysarthria , 2011, Language Resources and Evaluation.

[24]  Thomas S. Huang,et al.  Dysarthric speech database for universal access research , 2008, INTERSPEECH.

[25]  Karen A Hux,et al.  Accuracy of three speech recognition systems: Case study of dysarthric speech , 2000 .

[26]  H. Timothy Bunnell,et al.  The Nemours database of dysarthric speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[27]  J. Deller,et al.  The Whitaker database of dysarthric (cerebral palsy) speech. , 1993, The Journal of the Acoustical Society of America.

[28]  Gina-Anne Levow,et al.  Development of a Cantonese dysarthric speech corpus , 2015, INTERSPEECH.

[29]  Ole Morten Strand,et al.  Cepstral mean and variance normalization in the model domain , 2004 .