Comparing Humans and Automatic Speech Recognition Systems in Recognizing Dysarthric Speech

Speech is a complex process that requires control and coordination of articulation, breathing, voicing, and prosody. Dysarthria is a manifestation of an inability to control and coordinate one or more of these aspects, which results in poorly articulated and hardly intelligible speech. Hence individuals with dysarthria are rarely understood by human listeners. In this paper, we compare and evaluate how well dysarthric speech can be recognized by an automatic speech recognition system (ASR) and naive adult human listeners. The results show that despite the encouraging performance of ASR systems, and contrary to the claims in other studies, on average human listeners perform better in recognizing single-word dysarthric speech. In particular, the mean word recognition accuracy of speaker-adapted monophone ASR systems on stimuli produced by six dysarthric speakers is 68.39% while the mean percentage correct response of 14 naive human listeners on the same speech is 79.78% as evaluated using single-word multiple-choice intelligibility test.

[1]  P. Enderby,et al.  Frenchay Dysarthria Assessment , 1983 .

[2]  Mark Hasegawa-Johnson,et al.  Universal access: speech recognition for talkers with spastic dysarthria , 2009, INTERSPEECH.

[3]  Kuldip K. Paliwal,et al.  Evaluation of the modified group delay feature for isolatedword recognition , 2005, Proceedings of the Eighth International Symposium on Signal Processing and Its Applications, 2005..

[4]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[5]  Victor Zue,et al.  Speech database development at MIT: Timit and beyond , 1990, Speech Commun..

[6]  Frank Rudzicz,et al.  Adapting acoustic and lexical models to dysarthric speech , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  David R. Beukelman,et al.  Clinical Management of Dysarthric Speakers , 1987 .

[8]  R. Goulden,et al.  How large can a receptive vocabulary be? , 1990 .

[9]  Albert S. Bregman,et al.  The Auditory Scene. (Book Reviews: Auditory Scene Analysis. The Perceptual Organization of Sound.) , 1990 .

[10]  K. Yorkston,et al.  A comparison of techniques for measuring intelligibility of dysarthric speech. , 1978, Journal of communication disorders.

[11]  Frank RudziczAravind The TORGO database of acoustic and articulatory speech from speakers with dysarthria , 2012 .

[12]  D R Beukelman,et al.  Communication efficiency of dysarthric speakers as measured by sentence intelligibility and speaking rate. , 1981, The Journal of speech and hearing disorders.

[13]  Raymond D. Kent,et al.  Toward phonetic intelligibility testing in dysarthria. , 1989, The Journal of speech and hearing disorders.

[14]  Richard Lippmann,et al.  Speech recognition by machines and humans , 1997, Speech Commun..

[15]  Steve Young,et al.  The HTK book , 1995 .