Investigations on Speaking Mode Discrepancies in EMG-Based Speech Recognition

In this paper we present our recent study on the impact of speaking mode variabilities on speech recognition by surface electromyography (EMG). Surface electromyography captures the electric potentials of the human articulatory muscles, which enables a user to communicate naturally without making any audible sound. Our previous experiments have shown that the EMG signal varies greatly between different speaking modes, like audibly uttered speech and silently articulated speech. In this study we extend our previous research and quantify the impact of different speaking modes by investigating the amount of mode-specific leaves in phonetic decision trees. We show that this measure correlates highly with discrepancies in the spectral energy of the EMG signal, as well as with differences in the performance of a recognizer on different speaking modes. We furthermore present how EMG signal adaptation by spectral mapping decreases the effect of the speaking mode.

[1]  Herbert Gish,et al.  Understanding and improving speech recognition performance through the use of diagnostic tools , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Tanja Schultz,et al.  Modeling coarticulation in EMG-based continuous speech recognition , 2010, Speech Commun..

[3]  Tanja Schultz,et al.  Impact of different speaking modes on EMG-based speech recognition , 2009, INTERSPEECH.

[4]  Tanja Schultz,et al.  Impact of lack of acoustic feedback in EMG-based silent speech recognition , 2010, INTERSPEECH.

[5]  Tanja Schultz,et al.  A Spectral Mapping Method for EMG-based Recognition of Silent Speech , 2010, B-Interface.

[6]  Florian Metze,et al.  Analysis of gender normalization using MLP and VTLN features , 2010, INTERSPEECH.

[7]  Michael Finke,et al.  Wide context acoustic modeling in read vs. spontaneous speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Michael Schünke,et al.  Kopf und Neuroanatomie , 2006 .

[9]  Florian Metze,et al.  A flexible stream architecture for ASR using articulatory features , 2002, INTERSPEECH.

[10]  Tanja Schultz,et al.  Towards continuous speech recognition using surface electromyography , 2006, INTERSPEECH.

[11]  L. Maier-Hein,et al.  Session independent non-audible speech recognition using surface electromyography , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[12]  J. M. Gilbert,et al.  Silent speech interfaces , 2010, Speech Commun..