Improved acoustic modeling for automatic dysarthric speech recognition

Dysarthria is a neuromuscular disorder, occurs due to improper coordination of speech musculature. In order to improve the quality of life of people with speech disorder, assistive technology using automatic speech recognition (ASR) systems are gaining importance. Since it is difficult for dysarthric speakers to provide sufficient data, data insufficiency is one of the major problems in building an efficient dysarthric ASR system. In this paper, we focus on handling this issue by pooling data from unimpaired speech database. Then feature space maximum likelihood linear regression (fMLLR) transformation is applied on pooled data and dysarthric data to normalize the effect of inter-speaker variability. The acoustic model built using the combined features (acoustically transformed dysarthric + pooled features) gives an relative improvement of 18.09% and 50.00% over baseline system for Nemours database and Universal Access speech (digit set) database.

[1]  Frank Rudzicz,et al.  Adapting acoustic and lexical models to dysarthric speech , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Heidi Christensen,et al.  A comparative study of adaptive, automatic recognition of disordered speech , 2012, INTERSPEECH.

[3]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[4]  P. Enderby,et al.  Frenchay Dysarthria Assessment , 1983 .

[5]  Oscar Saz-Torralba,et al.  Combination of acoustic and lexical speaker adaptation for disordered speech recognition , 2009, INTERSPEECH.

[6]  Sheri Hunnicutt,et al.  An investigation of different degrees of dysarthric speech as input to speaker-adaptive and speaker-dependent recognition systems , 2001 .

[7]  Frank Rudzicz,et al.  Comparing Humans and Automatic Speech Recognition Systems in Recognizing Dysarthric Speech , 2011, Canadian Conference on AI.

[8]  Chin-Hui Lee,et al.  Maximum a posteriori linear regression for hidden Markov model adaptation , 1999, EUROSPEECH.

[9]  Richard M. Schwartz,et al.  A compact model for speaker-adaptive training , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[10]  Mark J. F. Gales,et al.  Maximum likelihood linear transformations for HMM-based speech recognition , 1998, Comput. Speech Lang..

[11]  Carla Lopes,et al.  Phone Recognition on the TIMIT Database , 2012 .

[12]  Myung Jong Kim,et al.  Dysarthric speech recognition using dysarthria-severity-dependent and speaker-adaptive models , 2013, INTERSPEECH.

[13]  Mark Hasegawa-Johnson,et al.  Universal access: speech recognition for talkers with spastic dysarthria , 2009, INTERSPEECH.

[14]  Frank RudziczAravind The TORGO database of acoustic and articulatory speech from speakers with dysarthria , 2012 .

[15]  Chng Eng Siong,et al.  Severity-Based Adaptation with Limited Data for ASR to Aid Dysarthric Speakers , 2014, PloS one.

[16]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[17]  Frank Rudzicz,et al.  Comparing speaker-dependent and speaker-adaptive acoustic models for recognizing dysarthric speech , 2007, Assets '07.

[18]  Thomas S. Huang,et al.  Dysarthric speech database for universal access research , 2008, INTERSPEECH.

[19]  H. Timothy Bunnell,et al.  The Nemours database of dysarthric speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.