Development of sEMG sensors and algorithms for silent speech recognition

OBJECTIVE Speech is among the most natural forms of human communication, thereby offering an attractive modality for human-machine interaction through automatic speech recognition (ASR). However, the limitations of ASR-including degradation in the presence of ambient noise, limited privacy and poor accessibility for those with significant speech disorders-have motivated the need for alternative non-acoustic modalities of subvocal or silent speech recognition (SSR). APPROACH We have developed a new system of face- and neck-worn sensors and signal processing algorithms that are capable of recognizing silently mouthed words and phrases entirely from the surface electromyographic (sEMG) signals recorded from muscles of the face and neck that are involved in the production of speech. The algorithms were strategically developed by evolving speech recognition models: first for recognizing isolated words by extracting speech-related features from sEMG signals, then for recognizing sequences of words from patterns of sEMG signals using grammar models, and finally for recognizing a vocabulary of previously untrained words using phoneme-based models. The final recognition algorithms were integrated with specially designed multi-point, miniaturized sensors that can be arranged in flexible geometries to record high-fidelity sEMG signal measurements from small articulator muscles of the face and neck. MAIN RESULTS We tested the system of sensors and algorithms during a series of subvocal speech experiments involving more than 1200 phrases generated from a 2200-word vocabulary and achieved an 8.9%-word error rate (91.1% recognition rate), far surpassing previous attempts in the field. SIGNIFICANCE These results demonstrate the viability of our system as an alternative modality of communication for a multitude of applications including: persons with speech impairments following a laryngectomy; military personnel requiring hands-free covert communication; or the consumer in need of privacy while speaking on a mobile phone in public.

[1]  Tanja Schultz,et al.  Towards real-life application of EMG-based speech recognition by using unsupervised adaptation , 2014, INTERSPEECH.

[2]  Omar Farooq,et al.  Mel filter-like admissible wavelet packet structure for speech recognition , 2001, IEEE Signal Processing Letters.

[3]  Ronald R. Coifman,et al.  Local discriminant bases and their applications , 1995, Journal of Mathematical Imaging and Vision.

[4]  Gérard Chollet,et al.  Development of a silent speech interface driven by ultrasound and optical images of the tongue and lips , 2010, Speech Commun..

[5]  A. Johansson,et al.  Electro-mechanical stability of surface EMG sensors , 2007, Medical & Biological Engineering & Computing.

[6]  H. Manabe,et al.  Multi-stream HMM for EMG-based speech recognition , 2004, The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[7]  Tanja Schultz,et al.  Biosignal-Based Spoken Communication: A Survey , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8]  Nancy Chen,et al.  Speech recognition for vocalized and subvocal modes of production using surface EMG signals from the neck and face , 2008, INTERSPEECH.

[9]  Lee Ki-Seung HMM-Based Automatic Speech Recognition using EMG Signal , 2006 .

[10]  Florian Metze,et al.  Towards speaker adaptive training of deep neural network acoustic models , 2014, INTERSPEECH.

[11]  L. Maier-Hein,et al.  Session independent non-audible speech recognition using surface electromyography , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[12]  Chulhee Lee,et al.  Evaluation of wavelet filters for speech recognition , 2000, Smc 2000 conference proceedings. 2000 ieee international conference on systems, man and cybernetics. 'cybernetics evolving to systems, humans, organizations, and their complex interactions' (cat. no.0.

[13]  Florian Metze,et al.  On speaker adaptation of long short-term memory recurrent neural networks , 2015, INTERSPEECH.

[14]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[15]  Kim Binsted,et al.  Web Browser Control Using EMG Based Sub Vocal Speech Recognition , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[16]  Toshiaki Sugimura,et al.  "Unvoiced speech recognition using EMG - mime speech recognition" , 2003, CHI Extended Abstracts.

[17]  Tanja Schultz,et al.  Session-independent EMG-based Speech Recognition , 2011, BIOSIGNALS.

[18]  Tanja Schultz,et al.  Towards continuous speech recognition using surface electromyography , 2006, INTERSPEECH.

[19]  D. F. Lovely,et al.  Myo-electric signals to augment speech recognition , 2001, Medical and Biological Engineering and Computing.

[20]  M S Morse,et al.  Research summary of a scheme to ascertain the availability of speech information in the myoelectric signals of neck and head muscles using surface electrodes. , 1986, Computers in biology and medicine.

[21]  Tanja Schultz,et al.  Modeling coarticulation in EMG-based continuous speech recognition , 2010, Speech Commun..

[22]  Charles Jorgensen,et al.  Small Vocabulary Recognition Using Surface Electromyography in an Acoustically Harsh Environment , 2005 .

[23]  Tanja Schultz,et al.  Array-based Electromyographic Silent Speech Interface , 2013, BIOSIGNALS.

[24]  Andreas G. Andreou,et al.  Investigation of silicon auditory models and generalization of linear discriminant analysis for improved speech recognition , 1997 .

[25]  James T. Heaton,et al.  Silent Speech Recognition as an Alternative Communication Device for Persons With Laryngectomy , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[26]  D. D. Lee,et al.  Sub auditory speech recognition based on EMG signals , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[27]  Jun Wang,et al.  Speaker-independent silent speech recognition with across-speaker articulatory normalization and speaker adaptive training , 2015, INTERSPEECH.

[28]  Ki-Seung Lee,et al.  EMG-Based Speech Recognition Using Hidden Markov Models With Global Control Variables , 2008, IEEE Transactions on Biomedical Engineering.

[29]  Kai Feng,et al.  The subspace Gaussian mixture model - A structured model for speech recognition , 2011, Comput. Speech Lang..

[30]  Quan Zhou,et al.  Improved Phoneme-Based Myoelectric Speech Recognition , 2009, IEEE Transactions on Biomedical Engineering.

[31]  M. Victor Wickerhauser,et al.  Adapted wavelet analysis from theory to software , 1994 .

[32]  Jonathan G. Fiscus,et al.  DARPA TIMIT:: acoustic-phonetic continuous speech corpus CD-ROM, NIST speech disc 1-1.1 , 1993 .