Deep Autoencoder Based Speech Features for Improved Dysarthric Speech Recognition

Dysarthria is a motor speech disorder, resulting in mumbled, slurred or slow speech that is generally difficult to understand by both humans and machines. Traditional Automatic Speech Recognizers (ASR) perform poorly on dysarthric speech recognition tasks. In this paper, we propose the use of deep autoencoders to enhance the Mel Frequency Cepstral Coefficients (MFCC) based features in order to improve dysarthric speech recognition. Speech from healthy control speakers is used to train an autoencoder which is in turn used to obtain improved feature representation for dysarthric speech. Additionally, we analyze the use of severity based tempo adaptation followed by autoencoder based speech feature enhancement. All evaluations were carried out on Universal Access dysarthric speech corpus. An overall absolute improvement of 16% was achieved using tempo adaptation followed by autoencoder based speech front end representation for DNN-HMM based dysarthric speech recognition.

[1]  Stuart P. Cunningham,et al.  Model adaptation and adaptive training for the recognition of dysarthric speech , 2015, SLPAT@Interspeech.

[2]  Panayiotis G. Georgiou,et al.  Perception Optimized Deep Denoising AutoEncoders for Speech Enhancement , 2016, INTERSPEECH.

[3]  Sunil Kumar Kopparapu,et al.  Improving Recognition of Dysarthric Speech Using Severity Based Tempo Adaptation , 2016, SPECOM.

[4]  John-Paul Hosom,et al.  Improving the intelligibility of dysarthric speech , 2007, Speech Commun..

[5]  John-Paul Hosom,et al.  Intelligibility of modifications to dysarthric speech , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[6]  Myung Jong Kim,et al.  Dysarthric speech recognition using dysarthria-severity-dependent and speaker-adaptive models , 2013, INTERSPEECH.

[7]  Jun Wang,et al.  Recognizing Dysarthric Speech due to Amyotrophic Lateral Sclerosis with Across-Speaker Articulatory Normalization , 2015, SLPAT@Interspeech.

[8]  Seyed Reza Shahamiri,et al.  Artificial neural networks as speech recognisers for dysarthric speech: Identifying the best-performing set of MFCC parameters and studying a speaker-independent approach , 2014, Adv. Eng. Informatics.

[9]  James R. Glass,et al.  Speech feature denoising and dereverberation via deep autoencoders for noisy reverberant speech recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  Frank Rudzicz Adjusting dysarthric speech signals to be more intelligible , 2013, Comput. Speech Lang..

[11]  M. Portnoff,et al.  Implementation of the digital phase vocoder using the fast Fourier transform , 1976 .

[12]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[13]  Mark Hasegawa-Johnson,et al.  Acoustic model adaptation using in-domain background models for dysarthric speech recognition , 2013, Comput. Speech Lang..

[14]  Chng Eng Siong,et al.  Severity-Based Adaptation with Limited Data for ASR to Aid Dysarthric Speakers , 2014, PloS one.

[15]  Frank Rudzicz,et al.  Learning mixed acoustic/articulatory models for disabled speech , 2010 .

[16]  Heidi Christensen,et al.  Automatic selection of speakers for improved acoustic modelling: recognition of disordered speech with sparse data , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[17]  F Rudzicz,et al.  Articulatory Knowledge in the Recognition of Dysarthric Speech , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[18]  Yasuo Horiuchi,et al.  Reverberant speech recognition based on denoising autoencoder , 2013, INTERSPEECH.

[19]  P. Vijayalakshmi,et al.  Intelligibility modification of dysarthric speech using HMM-based adaptive synthesis system , 2015, 2015 2nd International Conference on Biomedical Engineering (ICoBE).

[20]  Sunil Kumar Kopparapu,et al.  Automatic assessment of dysarthria severity level using audio descriptors , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[21]  Elmar Nöth,et al.  PEAKS - A system for the automatic evaluation of voice and speech disorders , 2009, Speech Commun..

[22]  Yu Tsao,et al.  Speech enhancement based on deep denoising autoencoder , 2013, INTERSPEECH.

[23]  Thomas S. Huang,et al.  Dysarthric speech database for universal access research , 2008, INTERSPEECH.

[24]  Sunil Kumar Kopparapu,et al.  Recognition of Dysarthric Speech Using Voice Parameters for Speaker Adaptation and Multi-Taper Spectral Estimation , 2016, INTERSPEECH.