Adjusting dysarthric speech signals to be more intelligible

Abstract This paper presents a system that transforms the speech signals of speakers with physical speech disabilities into a more intelligible form that can be more easily understood by listeners. These transformations are based on the correction of pronunciation errors by the removal of repeated sounds, the insertion of deleted sounds, the devoicing of unvoiced phonemes, the adjustment of the tempo of speech by phase vocoding, and the adjustment of the frequency characteristics of speech by anchor-based morphing of the spectrum. These transformations are based on observations of disabled articulation including improper glottal voicing, lessened tongue movement, and lessened energy produced by the lungs. This system is a substantial step towards full automation in speech transformation without the need for expert or clinical intervention. Among human listeners, recognition rates increased up to 191% (from 21.6% to 41.2%) relative to the original speech by using the module that corrects pronunciation errors. Several types of modified dysarthric speech signals are also supplied to a standard automatic speech recognition system. In that study, the proportion of words correctly recognized increased up to 121% (from 72.7% to 87.9%) relative to the original speech, across various parameterizations of the recognizer. This represents a significant advance towards human-to-human assistive communication software and human–computer interaction.

[1]  John-Paul Hosom,et al.  Improving the intelligibility of dysarthric speech , 2007, Speech Commun..

[2]  Simon King,et al.  Analysis of statistical parametric and unit selection speech synthesis systems applied to emotional speech , 2010, Speech Commun..

[3]  Douglas OʼShaughnessy Formant Estimation and Tracking , 2008 .

[4]  Mariapaola D'Imperio,et al.  Intonation structure and disfluency detection in stuttering , 2010 .

[5]  Hideki Kawahara,et al.  Implementation of realtime STRAIGHT speech manipulation system: Report on its first implementation , 2007 .

[6]  K. Hustad Estimating the Intelligibility of Speakers with Dysarthria , 2006, Folia Phoniatrica et Logopaedica.

[7]  Hesham Tolba,et al.  Towards the improvement of automatic recognition of dysarthric speech , 2009, 2009 2nd IEEE International Conference on Computer Science and Information Technology.

[8]  M. Portnoff,et al.  Implementation of the digital phase vocoder using the fast Fourier transform , 1976 .

[9]  Raymond D. Kent,et al.  Acoustic studies of dysarthric speech: methods, progress, and potential. , 1999, Journal of communication disorders.

[10]  Stephen J. Cox,et al.  Modelling Errors in Automatic Speech Recognition for Dysarthric Speakers , 2009, EURASIP J. Adv. Signal Process..

[11]  A. Neel,et al.  Effects of loud and amplified speech on sentence and word intelligibility in Parkinson disease. , 2009, Journal of speech, language, and hearing research : JSLHR.

[12]  Eric Sanders,et al.  Automatic Recognition Of Dutch Dysarthric Speech, A Pilot Study , 2002 .

[13]  David R Beukelman,et al.  Listener comprehension of severely dysarthric speech: effects of linguistic cues and stimulus cohesion. , 2002, Journal of speech, language, and hearing research : JSLHR.

[14]  B. Liu,et al.  Implementation of the Digital Phase Vocoder Using the Fast Fourier Transform , 2022 .

[15]  Douglas D. O'Shaughnessy Speech Communications: Human and Machine , 2012 .

[16]  Mark Tse Voice Transformation , 2003 .

[17]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[18]  Paul Taylor,et al.  The architecture of the Festival speech synthesis system , 1998, SSW.

[19]  David B. Pisoni,et al.  Text-to-speech: the mitalk system , 1987 .

[20]  Phil D. Green,et al.  Automatic speech recognition with sparse training data for dysarthric speakers , 2003, INTERSPEECH.

[21]  Kristin Rosen,et al.  Automatic speech recognition and a review of its functioning with dysarthric speech , 2000 .

[22]  Mark Hasegawa-Johnson,et al.  State-Transition Interpolation and MAP Adaptation for HMM-based Dysarthric Speech Recognition , 2010, SLPAT@NAACL.

[23]  Frank RudziczAravind The TORGO database of acoustic and articulatory speech from speakers with dysarthria , 2012 .

[24]  Graeme Hirst,et al.  Towards a Comparative Database of Dysarthric Articulation , 2008 .

[25]  James Carmichael,et al.  A speech-controlled environmental control system for people with severe dysarthria. , 2007, Medical engineering & physics.

[26]  H. Timothy Bunnell,et al.  The Nemours database of dysarthric speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[27]  William A. Sethares,et al.  Rhythm and Transforms , 2007 .

[28]  Ren-Hua Wang,et al.  The USTC System for Blizzard Challenge 2010 , 2008 .

[29]  Tetsuya Takiguchi,et al.  Integration of Metamodel and Acoustic Model for Dysarthric Speech Recognition , 2009, J. Multim..

[30]  Prasad D Polur,et al.  Investigation of an HMM/ANN hybrid structure in pattern recognition application using cepstral analysis of dysarthric (distorted) speech signals. , 2006, Medical engineering & physics.

[31]  Frank Rudzicz,et al.  Acoustic transformations to improve the intelligibility of dysarthric speech , 2011 .

[32]  Juergen Schroeter,et al.  Basic Principles of Speech Synthesis , 2008 .

[33]  Hideki Kawahara,et al.  STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds , 2006 .

[34]  Keiichi Tokuda,et al.  Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[35]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[36]  Sheri Hunnicutt,et al.  An investigation of different degrees of dysarthric speech as input to speaker-adaptive and speaker-dependent recognition systems , 2001 .

[37]  Hideki Kawahara,et al.  Auditory morphing based on an elastic perceptual distance metric in an interference-free time-frequency representation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[38]  Andrzej Czyzewski,et al.  Intelligent Processing of Stuttered Speech , 2003, Journal of Intelligent Information Systems.

[39]  P. Enderby,et al.  Frenchay Dysarthria Assessment , 1983 .

[40]  Arthur R. Toth,et al.  Cross-speaker articulatory position data for phonetic feature prediction , 2005, INTERSPEECH.

[41]  Qin Yan,et al.  Formant tracking linear prediction model using HMMs and Kalman filters for noisy speech processing , 2007, Comput. Speech Lang..

[42]  John-Paul Hosom,et al.  Intelligibility of modifications to dysarthric speech , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[43]  Wolfgang Hess,et al.  Pitch and Voicing Determination of Speech with an Extension Toward Music Signals , 2008 .

[44]  F. Rudzicz,et al.  Production Knowledge in the Recognition of Dysarthric Speech , 2011 .

[45]  Hideki Kawahara,et al.  Nearly defect-free F0 trajectory extraction for expressive speech modifications based on STRAIGHT , 2005, INTERSPEECH.

[46]  W. Press,et al.  Numerical Recipes in C++: The Art of Scientific Computing (2nd edn)1 Numerical Recipes Example Book (C++) (2nd edn)2 Numerical Recipes Multi-Language Code CD ROM with LINUX or UNIX Single-Screen License Revised Version3 , 2003 .

[47]  Tatsuya Kawahara,et al.  Voice activity detection based on optimally weighted combination of multiple features , 2005, INTERSPEECH.

[48]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[49]  Raymond D. Kent,et al.  Toward phonetic intelligibility testing in dysarthria. , 1989, The Journal of speech and hearing disorders.

[50]  Ben Maassen et al. Speech motor control in normal and disordered speech , 2014 .

[51]  Coarticulation • Suprasegmentals,et al.  Acoustic Phonetics , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[52]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[53]  Keiichi Tokuda,et al.  Speaker adaptation and the evaluation of speaker similarity in the EMIME speech-to-speech translation project , 2010, SSW.

[54]  Murray F. Spiegel,et al.  Comprehensive assessment of the telephone intelligibility of synthesized and natural speech , 1990, Speech Commun..

[55]  F. Milinazzo,et al.  Formant location from LPC analysis data , 1993, IEEE Trans. Speech Audio Process..

[56]  Karen A Hux,et al.  Accuracy of three speech recognition systems: Case study of dysarthric speech , 2000 .

[57]  Timothy Arbisi-Kelm,et al.  Intonation Structure and Disfluency Detection in Stuttering , 2007 .

[58]  M. Buchholz,et al.  Speech recognition and dysarthria: a single subject study of two individuals with profound impairment of speech and motor control , 2003, Logopedics, phoniatrics, vocology.