Speech interfaces based upon surface electromyography

This paper discusses the use of surface electromyography (EMG) to recognize and synthesize speech. The acoustic speech signal can be significantly corrupted by high noise in the environment or impeded by garments or masks. Such situations occur, for example, when firefighters wear pressurized suits with self-contained breathing apparatus (SCBA) or when astronauts perform operations in pressurized gear. In these conditions it is important to capture and transmit clear speech commands in spite of a corrupted or distorted acoustic speech signal. One way to mitigate this problem is to use surface electromyography to capture activity of speech articulators and then, either recognize spoken commands from EMG signals or use these signals to synthesize acoustic speech commands. We describe a set of experiments for both speech recognition and speech synthesis based on surface electromyography and discuss the lessons learned about the characteristics of the EMG signal for these domains. The experiments include speech recognition in high noise based on 15 commands for firefighters wearing self-contained breathing apparatus, a sub-vocal speech robotic platform control experiment based on five words, a speech recognition experiment testing recognition of vowels and consonants, and a speech synthesis experiment based on an articulatory speech synthesizer.

[1]  D. D. Lee,et al.  Sub auditory speech recognition based on EMG signals , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[2]  William M. Campbell,et al.  Multisensor MELPe using parameter substitution , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  R S McGowan,et al.  Task dynamic and articulatory recovery of lip and velar approximations under model mismatch conditions. , 1996, The Journal of the Acoustical Society of America.

[4]  Adrian D. C. Chan,et al.  Continuous myoelectric control for powered prostheses using hidden Markov models , 2005, IEEE Transactions on Biomedical Engineering.

[5]  Kim Binsted,et al.  Web Browser Control Using EMG Based Sub Vocal Speech Recognition , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[6]  J C Junqua,et al.  The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[7]  P Perrier,et al.  Vocal tract area function estimation from midsagittal dimensions with CT scans and a vocal tract cast: modeling the transition with two sets of coefficients. , 1992, Journal of speech and hearing research.

[8]  Tanja Schultz,et al.  Whispery speech recognition using adapted articulatory features , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[9]  N. Kingsbury Complex Wavelets for Shift Invariant Analysis and Filtering of Signals , 2001 .

[10]  Jean-Claude Junqua,et al.  The Lombard effect: a reflex to better communicate with others in noise , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[11]  L.J. Trejo,et al.  Multimodal neuroelectric interface development , 2013, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[12]  S. Maeda An articulatory model of the tongue based on a statistical analysis , 1979 .

[13]  Thomas Baer,et al.  An articulatory synthesizer for perceptual research , 1978 .

[14]  H. Franco,et al.  Combining standard and throat microphones for robust speech recognition , 2003, IEEE Signal Processing Letters.

[15]  John F. Holzrichter,et al.  Denoising of human speech using combined acoustic and EM sensor signal processing , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[16]  Shinji Maeda,et al.  A digital simulation method of the vocal-tract system , 1982, Speech Commun..

[17]  Korin Richmond,et al.  Generating gestural timing from EMA data using articulatory resynthesis , 2008 .

[18]  Kim Binsted,et al.  Small-vocabulary speech recognition using surface electromyography , 2006, Interact. Comput..

[19]  A.K. Barros,et al.  Subvocal Speech Recognition Based on EMG Signal Using Independent Component Analysis and Neural Network MLP , 2008, 2008 Congress on Image and Signal Processing.

[20]  Peter Birkholz,et al.  Control concepts for articulatory speech synthesis , 2007, SSW.

[21]  L Saltzman Elliot,et al.  A Dynamical Approach to Gestural Patterning in Speech Production , 1989 .

[22]  Shinji Maeda Improved articulatory models , 1988 .

[23]  Hervé Bourlard,et al.  On the Use of Information Retrieval Measures for Speech Recognition Evaluation , 2004 .

[24]  O. Fujimura,et al.  Model for Specification of the Vocal‐Tract Area Function , 1966 .

[25]  Charles Jorgensen,et al.  Gestures as Input: Neuroelectric Joysticks and Keyboards , 2003, IEEE Pervasive Comput..

[26]  J. Jansen,et al.  Afferent impulses to the cerebellar hemispheres from the cerebral cortex and certain subcortical nuclei; an electroanatomical study in the cat. , 1957, Acta physiologica Scandinavica. Supplementum.

[27]  B. Yegnanarayana,et al.  Language identification in noisy environments using throat microphone signals , 2005, Proceedings of 2005 International Conference on Intelligent Sensing and Information Processing, 2005..

[28]  Mats Djupsjöbacka,et al.  Acquisition, Processing and Analysis of the Surface Electromyogram , 1999 .

[29]  P. Mermelstein Articulatory model for the study of speech production. , 1973, The Journal of the Acoustical Society of America.