Modeling coarticulation in EMG-based continuous speech recognition

This paper discusses the use of surface electromyography for automatic speech recognition. Electromyographic signals captured at the facial muscles record the activity of the human articulatory apparatus and thus allow to trace back a speech signal even if it is spoken silently. Since speech is captured before it gets airborne, the resulting signal is not masked by ambient noise. The resulting Silent Speech Interface has the potential to overcome major limitations of conventional speech-driven interfaces: it is not prone to any environmental noise, allows to silently transmit confidential information, and does not disturb bystanders. We describe our new approach of phonetic feature bundling for modeling coarticulation in EMG-based speech recognition and report results on the EMG-PIT corpus, a multiple speaker large vocabulary database of silent and audible EMG speech recordings, which we recently collected. Our results on speaker-dependent and speaker-independent setups show that modeling the interdependence of phonetic features reduces the word error rate of the baseline system by over 33% relative. Our final system achieves 10% word error rate for the best-recognized speaker on a 101-word vocabulary task, bringing EMG-based speech recognition within a useful range for the application of Silent Speech Interfaces.

[1]  M. S. Morse,et al.  Speech Recognition Using Myoelectric Signals With Neural Networks , 1991, Proceedings of the Annual International Conference of the IEEE Engineering in Medicine and Biology Society Volume 13: 1991.

[2]  Tanja Schultz,et al.  Continuous Electromyographic Speech Recognition with a Multi-Stream Decoding Architecture , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3]  Simon King,et al.  Articulatory feature recognition using dynamic Bayesian networks , 2007, Comput. Speech Lang..

[4]  L. Maier-Hein,et al.  Session independent non-audible speech recognition using surface electromyography , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[5]  Katrin Kirchhoff,et al.  Robust speech recognition using articulatory information , 1998 .

[6]  Tanja Schultz,et al.  Enhanced tree clustering with single pronunciation dictionary for conversational speech recognition , 2003, INTERSPEECH.

[7]  Florian Metze,et al.  A flexible stream architecture for ASR using articulatory features , 2002, INTERSPEECH.

[8]  Tanja Schultz,et al.  Towards continuous speech recognition using surface electromyography , 2006, INTERSPEECH.

[9]  Tanja Schultz,et al.  Sub-word unit based non-audible speech recognition using surface electromyography , 2006, INTERSPEECH.

[10]  Alex Waibel,et al.  Streamlining the front end of a speech recognizer , 2000, INTERSPEECH.

[11]  Tanja Schultz,et al.  Towards Speaker-adaptive Speech Recognition based on Surface Electromyography , 2009, BIOSIGNALS.

[12]  Tanja Schultz,et al.  Impact of different speaking modes on EMG-based speech recognition , 2009, INTERSPEECH.

[13]  Caroline L. Smith Handbook of the International Phonetic Association: a guide to the use of the International Phonetic Alphabet (1999). Cambridge: Cambridge University Press. Pp. ix+204. , 2000, Phonology.

[14]  Peter Beyerlein,et al.  Diskriminative Modellkombination in Spracherkennungssystemen mit großem Wortschatz , 2000 .

[15]  Noboru Sugie,et al.  A Speech Prosthesis Employing a Speech Synthesizer-Vowel Discrimination from Perioral Muscle Activities and Vowel Production , 1985, IEEE Transactions on Biomedical Engineering.

[16]  Chuck,et al.  Sub Auditory Speech Recognition based on EMG/EPG Signals , 2022 .

[17]  Florian Metze,et al.  Articulatory features for conversational speech recognition , 2005 .

[18]  Michael Picheny,et al.  Decision trees for phonological rules in continuous speech , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[19]  D. F. Lovely,et al.  Myo-electric signals to augment speech recognition , 2001, Medical and Biological Engineering and Computing.

[20]  J. M. Gilbert,et al.  Silent speech interfaces , 2010, Speech Commun..

[21]  Maria Dietrich,et al.  The effects of stress reactivity on extralaryngeal muscle tension in vocally normal participants as a function of personality , 2009 .

[22]  M.S. Morse,et al.  Use of myoelectric signals to recognize speech , 1989, Images of the Twenty-First Century. Proceedings of the Annual International Engineering in Medicine and Biology Society,.

[23]  M S Morse,et al.  Research summary of a scheme to ascertain the availability of speech information in the myoelectric signals of neck and head muscles using surface electrodes. , 1986, Computers in biology and medicine.

[24]  Geoffrey E. Hinton,et al.  Split and Merge EM Algorithm for Improving Gaussian Mixture Density Estimates , 2000, J. VLSI Signal Process..

[25]  Michael Schünke,et al.  Kopf und Neuroanatomie , 2006 .