Acoustic transformations to improve the intelligibility of dysarthric speech

This paper describes modifications to acoustic speech signals produced by speakers with dysarthria in order to make those utterances more intelligible to typical listeners. These modifications include the correction of tempo, the adjustment of formant frequencies in sonorants, the removal of aberrant voicing, the deletion of phoneme insertion errors, and the replacement of erroneously dropped phonemes. Through simple evaluations of intelligibility with naive listeners, we show that the correction of phoneme errors results in the greatest increase in intelligibility and is therefore a desirable mechanism for the eventual creation of augmentative application software for individuals with dysarthria.

[1]  John-Paul Hosom,et al.  Improving the intelligibility of dysarthric speech , 2007, Speech Commun..

[2]  Eric Moulines,et al.  Pitch-synchronous waveform processing techniques for text-to-speech synthesis using diphones , 1989, Speech Commun..

[3]  Timothy Arbisi-Kelm,et al.  Intonation Structure and Disfluency Detection in Stuttering , 2007 .

[4]  Douglas D. O'Shaughnessy Speech Communications: Human and Machine , 2012 .

[5]  Paul Taylor,et al.  The architecture of the Festival speech synthesis system , 1998, SSW.

[6]  Mariapaola D'Imperio,et al.  Intonation structure and disfluency detection in stuttering , 2010 .

[7]  Murray F. Spiegel,et al.  Comprehensive assessment of the telephone intelligibility of synthesized and natural speech , 1990, Speech Commun..

[8]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[9]  Hideki Kawahara,et al.  Implementation of realtime STRAIGHT speech manipulation system: Report on its first implementation , 2007 .

[10]  Monson H. Hayes Schaum's Outline of Theory and Problems of Digital Signal Processing , 1998 .

[11]  Frank Rudzicz,et al.  The TORGO database of acoustic and articulatory speech from speakers with dysarthria , 2011, Language Resources and Evaluation.

[12]  Hideki Kawahara,et al.  Temporally variable multi-aspect auditory morphing enabling extrapolation without objective and perceptual breakdown , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[13]  Pascal van Lieshout,et al.  Speech Motor Control in Normal and Disordered Speech: Future Developments in Theory and Methodology , 2004 .

[14]  Wolfgang Hess,et al.  Pitch and Voicing Determination of Speech with an Extension Toward Music Signals , 2008 .

[15]  David B. Pisoni,et al.  Text-to-speech: the mitalk system , 1987 .

[16]  Hideki Kawahara,et al.  STRAIGHT, exploitation of the other aspect of VOCODER: Perceptually isomorphic decomposition of speech sounds , 2006 .

[17]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[18]  Hesham Tolba,et al.  Towards the improvement of automatic recognition of dysarthric speech , 2009, 2009 2nd IEEE International Conference on Computer Science and Information Technology.

[19]  M. Buchholz,et al.  Speech recognition and dysarthria: a single subject study of two individuals with profound impairment of speech and motor control , 2003, Logopedics, phoniatrics, vocology.

[20]  Arthur R. Toth,et al.  Cross-speaker articulatory position data for phonetic feature prediction , 2005, INTERSPEECH.

[21]  K. Hustad Estimating the Intelligibility of Speakers with Dysarthria , 2006, Folia Phoniatrica et Logopaedica.

[22]  Phil D. Green,et al.  Automatic speech recognition with sparse training data for dysarthric speakers , 2003, INTERSPEECH.

[23]  F. Rudzicz,et al.  Production Knowledge in the Recognition of Dysarthric Speech , 2011 .

[24]  Tatsuya Kawahara,et al.  Voice activity detection based on optimally weighted combination of multiple features , 2005, INTERSPEECH.

[25]  Kristin Rosen,et al.  Automatic speech recognition and a review of its functioning with dysarthric speech , 2000 .

[26]  A. Neel,et al.  Effects of loud and amplified speech on sentence and word intelligibility in Parkinson disease. , 2009, Journal of speech, language, and hearing research : JSLHR.

[27]  B. Liu,et al.  Implementation of the Digital Phase Vocoder Using the Fast Fourier Transform , 2022 .

[28]  Karen A Hux,et al.  Accuracy of three speech recognition systems: Case study of dysarthric speech , 2000 .

[29]  Jacob Benesty,et al.  Introduction to Speech Processing , 2008 .

[30]  RudziczFrank,et al.  The TORGO database of acoustic and articulatory speech from speakers with dysarthria , 2012, LREC 2012.

[31]  David R Beukelman,et al.  Listener comprehension of severely dysarthric speech: effects of linguistic cues and stimulus cohesion. , 2002, Journal of speech, language, and hearing research : JSLHR.

[32]  Qin Yan,et al.  Formant tracking linear prediction model using HMMs and Kalman filters for noisy speech processing , 2007, Comput. Speech Lang..

[33]  Keiichi Tokuda,et al.  Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[34]  Hideki Kawahara,et al.  Auditory morphing based on an elastic perceptual distance metric in an interference-free time-frequency representation , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[35]  William H. Press,et al.  Book-Review - Numerical Recipes in Pascal - the Art of Scientific Computing , 1989 .

[36]  Juergen Schroeter,et al.  Basic Principles of Speech Synthesis , 2008 .

[37]  Alan W. Black,et al.  Multilingual text-to-speech synthesis , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[38]  F. Milinazzo,et al.  Formant location from LPC analysis data , 1993, IEEE Trans. Speech Audio Process..

[39]  Stephen A. Dyer,et al.  Digital signal processing , 2018, 8th International Multitopic Conference, 2004. Proceedings of INMIC 2004..

[40]  William A. Sethares,et al.  Rhythm and Transforms , 2007 .

[41]  P. Enderby,et al.  Frenchay Dysarthria Assessment , 1983 .

[42]  Julia Hirschberg,et al.  A Speech-First Model for Repair Detection and Correction , 1993, HLT.

[43]  Douglas OʼShaughnessy Formant Estimation and Tracking , 2008 .

[44]  Coarticulation • Suprasegmentals,et al.  Acoustic Phonetics , 2019, The SAGE Encyclopedia of Human Communication Sciences and Disorders.

[45]  Andrzej Czyzewski,et al.  Intelligent Processing of Stuttered Speech , 2003, Journal of Intelligent Information Systems.

[46]  Hideki Kawahara,et al.  Nearly defect-free F0 trajectory extraction for expressive speech modifications based on STRAIGHT , 2005, INTERSPEECH.

[47]  John-Paul Hosom,et al.  Intelligibility of modifications to dysarthric speech , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[48]  James Carmichael,et al.  A speech-controlled environmental control system for people with severe dysarthria. , 2007, Medical engineering & physics.

[49]  Mark Tse Voice Transformation , 2003 .

[50]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .