Speech technology-based assessment of phoneme intelligibility in dysarthria.

BACKGROUND Currently, clinicians mainly rely on perceptual judgements to assess intelligibility of dysarthric speech. Although often highly reliable, this procedure is subjective with a lot of intrinsic variables. Therefore, certain benefits can be expected from a speech technology-based intelligibility assessment. Previous attempts to develop an automated intelligibility assessment mainly relied on automatic speech recognition (ASR) systems that were trained to recognize the speech of persons without known impairments. In this paper automatic speech alignment (ASA) systems are used instead. In addition, previous attempts only made use of phonemic features (PMF). However, since articulation is an important contributing factor to intelligibility of dysarthric speech and since phonological features (PLF) are shared by multiple phonemes, phonological features may be more appropriate to characterize and identify dysarthric phonemes. AIMS To investigate the reliability of objective phoneme intelligibility scores obtained by three types of intelligibility models: models using only phonemic features (yielded by an automated speech aligner) (PMF models), models using only phonological features (PLF models), and models using a combination of phonemic and phonological features (PMF + PLF models). METHODS & PROCEDURES Correlations were calculated between the objective phoneme intelligibility scores of 60 dysarthric speakers and the corresponding perceptual phoneme intelligibility scores obtained by a standardized perceptual phoneme intelligibility assessment. OUTCOMES & RESULTS The correlations between the objective and perceptual intelligibility scores range from 0.793 for the PMF models, over 0.828 for PLF models to 0.943 for PMF + PLF models. The features selected to obtain such high correlations can be divided into six main subgroups: (1) vowel-related phonemic and phonological features, (2) lateral-related features, (3) silence-related features, (4) fricative-related features, (5) velar-related features and (6) plosive-related features. CONCLUSIONS & IMPLICATIONS The phoneme intelligibility scores of dysarthric speakers obtained by the three investigated intelligibility model types are reliable. The highest correlation between the perceptual and objective intelligibility scores was found for models combining phonemic and phonological features. The intelligibility scoring system is now ready to be implemented in a clinical tool.

[1]  D R Beukelman,et al.  Influence of passage familiarity on intelligibility estimates of dysarthric speech. , 1980, Journal of communication disorders.

[2]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[3]  L. Shriberg,et al.  Phonological disorders III: a procedure for assessing severity of involvement. , 1982, The Journal of speech and hearing disorders.

[4]  Raymond D. Kent,et al.  Toward phonetic intelligibility testing in dysarthria. , 1989, The Journal of speech and hearing disorders.

[5]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[6]  Raymond D. Kent,et al.  Impairment of speech intelligibility in men with amyotrophic lateral sclerosis. , 1990, The Journal of speech and hearing disorders.

[7]  Raymond D. Kent,et al.  Acoustic-phonetic contrasts and intelligibility in the dysarthria associated with mixed cerebral palsy. , 1992, Journal of speech and hearing research.

[8]  G. Weismer,et al.  The influence of speaking rate on vowel space and speech intelligibility for individuals with amyotrophic lateral sclerosis. , 1995, Journal of speech and hearing research.

[9]  Kathryn M. Yorkston,et al.  Comprehensibility of Dysarthric Speech , 1996 .

[10]  Dirk Van Compernolle,et al.  CoGeN een corpus gesproken Nederlands voor spraaktechnologisch onderzoek , 1997 .

[11]  Jacques Duchateau,et al.  HMM based acoustic modelling in large vocabulary speech recognition , 1998 .

[12]  K. Stevens,et al.  An acoustical study of the fricative /s/ in the speech of individuals with dysarthria. , 2001, Journal of speech, language, and hearing research : JSLHR.

[13]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[14]  Lou Boves,et al.  Experiences from the Spoken Dutch Corpus Project , 2002, LREC.

[15]  Marc S De Bodt,et al.  Intelligibility as a linear combination of dimensions in dysarthric speech. , 2002, Journal of communication disorders.

[16]  Phil D. Green,et al.  Revisiting dysarthria assessment intelligibility metrics , 2004, INTERSPEECH.

[17]  P. Kuhl,et al.  The effect of reduced vowel working space on speech intelligibility in Mandarin-speaking young adults with cerebral palsy. , 2005, The Journal of the Acoustical Society of America.

[18]  Elmar Nöth,et al.  Intelligibility of laryngectomees’ substitute speech: automatic speech recognition and subjective rating , 2005, European Archives of Oto-Rhino-Laryngology and Head & Neck.

[19]  Jean-Pierre Martens,et al.  On The Use of Phonological Features for Pronunciation Scoring , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[20]  Jean-Pierre Martens,et al.  Speech recognition with phonological features: some issues to attend , 2006, INTERSPEECH.

[21]  Elmar Nöth,et al.  Towards robust automatic evaluation of pathologic telephone speech , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).