Impact of different speaking modes on EMG-based speech recognition

We present our recent results on speech recognition by surface electromyography (EMG), which captures the electric potentials that are generated by the human articulatory muscles. This technique can be used to enable Silent Speech Interfaces, since EMG signals are generated even when people only articulate speech without producing any sound. Preliminary experiments have shown that the EMG signals created by audible and silent speech are quite distinct. In this paper we first compare various methods of initializing a silent speech EMG recognizer, showing that the performance of the recognizer substantially varies across different speakers. Based on this, we analyze EMG signals from audible and silent speech, present first results on how discrepancies between these speaking modes affect EMG recognizers, and suggest areas for future work. Index Terms: speech recognition, surface electromyography, silent speech, articulation

[1]  Frank H. Guenther,et al.  Speech motor control: Acoustic goals, saturation effects, auditory feedback and internal models , 1997, Speech Commun..

[2]  Tanja Schultz,et al.  Towards Speaker-adaptive Speech Recognition based on Surface Electromyography , 2009, BIOSIGNALS.

[3]  Kim Binsted,et al.  Web Browser Control Using EMG Based Sub Vocal Speech Recognition , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[4]  F. Guenther,et al.  A theoretical investigation of reference frames for the planning of speech movements. , 1998, Psychological review.

[5]  Jeffery A. Jones,et al.  Speech disruption during delayed auditory feedback with simultaneous visual feedback. , 2007, The Journal of the Acoustical Society of America.

[6]  Tanja Schultz,et al.  Synthesizing speech from electromyography using voice transformation techniques , 2009, INTERSPEECH.

[7]  B. Hudgins,et al.  Hidden Markov model classification of myoelectric signals in speech , 2002 .

[8]  Tanja Schultz,et al.  Continuous Electromyographic Speech Recognition with a Multi-Stream Decoding Architecture , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[9]  L. Maier-Hein,et al.  Session independent non-audible speech recognition using surface electromyography , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[10]  Tanja Schultz,et al.  Whispery speech recognition using adapted articulatory features , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[11]  Tanja Schultz,et al.  Towards continuous speech recognition using surface electromyography , 2006, INTERSPEECH.

[12]  Kiyohiro Shikano,et al.  Non-audible murmur recognition , 2003, INTERSPEECH.

[13]  Gérard Chollet,et al.  Continuous-speech phone recognition from ultrasound and optical images of the tongue and lips , 2007, INTERSPEECH.

[14]  Maria Dietrich,et al.  The effects of stress reactivity on extralaryngeal muscle tension in vocally normal participants as a function of personality , 2009 .

[15]  Tanja Schultz,et al.  Modeling coarticulation in EMG-based continuous speech recognition , 2010, Speech Commun..

[16]  Michael Picheny,et al.  Decision trees for phonological rules in continuous speech , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[17]  S. Jovicic,et al.  Acoustic analysis of consonants in whispered speech. , 2008, Journal of voice : official journal of the Voice Foundation.