Synthesizing speech from electromyography using voice transformation techniques

Surface electromyography (EMG) can be used to record the activation potentials of articulatory muscles while a person speaks. This technique could enable silent speech interfaces, as EMG signals are generated even when people pantomime speech without producing sound. Having effective silent speech interfaces would enable a number of compelling applications, allowing people to communicate in areas where they would not want to be overheard or where the background noise is so prevalent that they could not be heard. In order to use EMG signals in speech interfaces, however, there must be a relatively accurate method to map the signals to speech. Up to this point, it appears that most attempts to use EMG signals for speech interfaces have focused on Automatic Speech Recognition (ASR) based on features derived from EMG signals. Following the lead of other researchers who worked with Electro-Magnetic Articulograph (EMA) data and Non-Audible Murmur (NAM) speech, we explore the alternative idea of using Voice Transformation (VT) techniques to synthesize speech from EMG signals. With speech output, both ASR systems and human listeners can directly use EMG-based systems. We report the results of our preliminary studies, noting the difficulties we encountered and suggesting areas for future work. Index Terms: electromyography, silent speech, voice transformation, speech synthesis

[1]  Richard M. Stern,et al.  Analysis-by-synthesis features for speech recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Alexander Kain,et al.  High-resolution voice transformation , 2001 .

[3]  L. Maier-Hein,et al.  Session independent non-audible speech recognition using surface electromyography , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[4]  Keiichi Tokuda,et al.  Spectral conversion based on maximum likelihood estimation considering global variance of converted parameter , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[5]  Keiichi Tokuda,et al.  Mapping from articulatory movements to vocal tract spectrum with Gaussian mixture model for articulatory speech synthesis , 2004, SSW.

[6]  Tanja Schultz,et al.  Towards continuous speech recognition using surface electromyography , 2006, INTERSPEECH.

[7]  Eric Moulines,et al.  Statistical methods for voice quality transformation , 1995, EUROSPEECH.

[8]  Maria Dietrich,et al.  The effects of stress reactivity on extralaryngeal muscle tension in vocally normal participants as a function of personality , 2009 .

[9]  Tanja Schultz,et al.  Modeling coarticulation in EMG-based continuous speech recognition , 2010, Speech Commun..

[10]  B. Yegnanarayana,et al.  Voice conversion: Factors responsible for quality , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Tomoki Toda,et al.  Improving body transmitted unvoiced speech with statistical voice conversion , 2006, INTERSPEECH.