Articulatory Knowledge in the Recognition of Dysarthric Speech

Disabled speech is not compatible with modern generative and acoustic-only models of speech recognition (ASR). This work considers the use of theoretical and empirical knowledge of the vocal tract for atypical speech in labeling segmented and unsegmented sequences. These combined models are compared against discriminative models such as neural networks, support vector machines, and conditional random fields. Results show significant improvements in accuracy over the baseline through the use of production knowledge. Furthermore, although the statistics of vocal tract movement do not appear to be transferable between regular and disabled speakers, transforming the space of the former given knowledge of the latter before retraining gives high accuracy. This work may be applied within components of assistive software for speakers with dysarthria.

[1]  Hervé Bourlard,et al.  Speech recognition with auxiliary information , 2004, IEEE Transactions on Speech and Audio Processing.

[2]  Trevor Darrell,et al.  Latent-Dynamic Discriminative Models for Continuous Gesture Recognition , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Lawrence S. Meyers,et al.  Computer recognition of the speech of adults with cerebral palsy and dysarthria , 1991 .

[4]  Geoffrey H. Sperber,et al.  Clinically Oriented Anatomy , 2006 .

[5]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[6]  Prasad D Polur,et al.  Investigation of an HMM/ANN hybrid structure in pattern recognition application using cepstral analysis of dysarthric (distorted) speech signals. , 2006, Medical engineering & physics.

[7]  Frank Rudzicz,et al.  Applying discretized articulatory knowledge to dysarthric speech , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Andrew McCallum,et al.  Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[9]  Elliot Saltzman,et al.  Task Dynamic Coordination of the Speech Articulators: A Preliminary Model , 1986 .

[10]  Simon King,et al.  Articulatory Feature-Based Methods for Acoustic and Audio-Visual Speech Recognition: Summary from the 2006 JHU Summer workshop , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[11]  Pascal van Lieshout,et al.  Suitability of a UV-based video recording system for the analysis of small facial motions during speech , 2007, Speech Commun..

[12]  Takashi Fukuda,et al.  Distinctive phonetic feature extraction for robust speech recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[13]  George N. Clements,et al.  The geometry of phonological features , 1985, Phonology Yearbook.

[14]  Graeme Hirst,et al.  Towards a Comparative Database of Dysarthric Articulation , 2008 .

[15]  P. Enderby,et al.  Frenchay Dysarthria Assessment , 1983 .

[16]  James Carmichael,et al.  A speech-controlled environmental control system for people with severe dysarthria. , 2007, Medical engineering & physics.

[17]  Takashi Fukuda,et al.  Noise-robust automatic speech recognition using orthogonalized distinctive phonetic feature vectors , 2003, INTERSPEECH.

[18]  Jan Noyes,et al.  Speech recognition technology for individuals with disabilities , 1992 .

[19]  Simon King,et al.  Detection of phonological features in continuous speech using neural networks , 2000, Comput. Speech Lang..

[20]  William H. Press,et al.  Book-Review - Numerical Recipes in Pascal - the Art of Scientific Computing , 1989 .

[21]  Sheri Hunnicutt,et al.  An investigation of different degrees of dysarthric speech as input to speaker-adaptive and speaker-dependent recognition systems , 2001 .

[22]  Simon King,et al.  Articulatory feature recognition using dynamic Bayesian networks , 2007, Comput. Speech Lang..

[23]  D R Beukelman,et al.  Communication efficiency of dysarthric speakers as measured by sentence intelligibility and speaking rate. , 1981, The Journal of speech and hearing disorders.

[24]  Florian Metze Discriminative speaker adaptation using articulatory features , 2007, Speech Commun..

[25]  Mark Hasegawa-Johnson,et al.  Audiovisual Phonologic-Feature-Based Recognition of Dysarthric Speech , 2006 .

[26]  Eric Sanders,et al.  Automatic Recognition Of Dutch Dysarthric Speech, A Pilot Study , 2002 .

[27]  Stuart J. Russell,et al.  Dynamic bayesian networks: representation, inference and learning , 2002 .

[28]  F. A. Seiler,et al.  Numerical Recipes in C: The Art of Scientific Computing , 1989 .

[29]  L Saltzman Elliot,et al.  A Dynamical Approach to Gestural Patterning in Speech Production , 1989 .

[30]  H. Timothy Bunnell,et al.  The Nemours database of dysarthric speech , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[31]  Raymond D. Kent Research on speech motor control and its disorders: a review and prospective. , 2000, Journal of communication disorders.

[32]  John-Paul Hosom,et al.  Intelligibility of modifications to dysarthric speech , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[33]  Mirjam Wester Syllable classification using articulatory-acoustic features , 2003, INTERSPEECH.

[34]  Peter Norvig,et al.  Artificial Intelligence: A Modern Approach , 1995 .

[35]  Frank Rudzicz,et al.  Phonological features in discriminative classification of dysarthric speech , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[36]  Yana Yunusova,et al.  Accuracy assessment for AG500, electromagnetic articulograph. , 2009, Journal of speech, language, and hearing research : JSLHR.

[37]  G Jayaram,et al.  Experiments in dysarthric speech recognition using artificial neural networks. , 1995, Journal of rehabilitation research and development.

[38]  Stephen J. Cox,et al.  Modelling Errors in Automatic Speech Recognition for Dysarthric Speakers , 2009, EURASIP J. Adv. Signal Process..

[39]  Linda J. Ferrier,et al.  Dysarthric speakers' intelligibility and speech characteristics in relation to computer speech recognition , 1995 .

[40]  Douglas D. O'Shaughnessy Speech Communications: Human and Machine , 2012 .

[41]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[42]  Katrin Kirchhoff,et al.  Robust speech recognition using articulatory information , 1998 .

[43]  Kristin Rosen,et al.  Automatic speech recognition and a review of its functioning with dysarthric speech , 2000 .

[44]  M. Lindstrom,et al.  Articulatory movements during vowels in speakers with dysarthria and healthy controls. , 2008, Journal of speech, language, and hearing research : JSLHR.

[45]  Kenneth N. Stevens,et al.  Quantal theory, enhancement and overlap , 2010, J. Phonetics.

[46]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[47]  M. Buchholz,et al.  Speech recognition and dysarthria: a single subject study of two individuals with profound impairment of speech and motor control , 2003, Logopedics, phoniatrics, vocology.

[48]  James Carmichael,et al.  Polynomial dynamic time warping kernel support vector machines for dysarthric speech recognition with sparse training data , 2005, INTERSPEECH.

[49]  Noam Chomsky,et al.  The Sound Pattern of English , 1968 .

[50]  Thomas S. Huang,et al.  Hmm-Based and Svm-Based Recognition of the Speech of Talkers With Spastic Dysarthria , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[51]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[52]  Raymond D. Kent,et al.  X‐ray microbeam speech production database , 1990 .

[53]  Simon King,et al.  Asynchronous Articulatory Feature Recognition Using Dynamic Bayesian Networks , 2004 .

[54]  Karen A Hux,et al.  Accuracy of three speech recognition systems: Case study of dysarthric speech , 2000 .

[55]  Yasunaga Miyazawa An all-phoneme ergodic HMM for unsupervised speaker adaptation , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[56]  Raymond D. Kent,et al.  Toward phonetic intelligibility testing in dysarthria. , 1989, The Journal of speech and hearing disorders.

[57]  Zoubin Ghahramani,et al.  Learning Dynamic Bayesian Networks , 1997, Summer School on Neural Networks.

[58]  Simon King,et al.  Speech production knowledge in automatic speech recognition. , 2007, The Journal of the Acoustical Society of America.

[59]  Xavier Menendez-Pidal,et al.  Automatic phoneme labeler in the TIMIT database , 1997 .

[60]  Pascal H H M van Lieshout,et al.  Speech motor control in fluent and dysfluent speech production of an individual with apraxia of speech and Broca's aphasia , 2007, Clinical linguistics & phonetics.

[61]  Jianwu Dang,et al.  Integration of articulatory and spectrum features based on the hybrid HMM/BN modeling framework , 2006, Speech Commun..

[62]  Kevin P. Murphy,et al.  Dynamic Bayesian Networks for Audio-Visual Speech Recognition , 2002, EURASIP J. Adv. Signal Process..

[63]  Steve Young,et al.  The HTK book version 3.4 , 2006 .

[64]  Simon H. Parson,et al.  Clinically Oriented Anatomy, 6th edn. , 2009 .

[65]  Victor Zue,et al.  Speech database development at MIT: Timit and beyond , 1990, Speech Commun..

[66]  Michael Picheny,et al.  Articulatory feature detection with Support Vector Machines for integration into ASR and phone recognition , 2009, 2009 IEEE Workshop on Automatic Speech Recognition & Understanding.

[67]  Roger K. Moore,et al.  Towards capturing fine phonetic variation in speech using articulatory features , 2007, Speech Commun..

[68]  Alan Wrench,et al.  Continuous speech recognition using articulatory data , 2000, INTERSPEECH.