Tackling Speaking Mode Varieties in EMG-Based Speech Recognition

An electromyographic (EMG) silent speech recognizer is a system that recognizes speech by capturing the electric potentials of the human articulatory muscles, thus enabling the user to communicate silently. After having established a baseline EMG-based continuous speech recognizer, in this paper, we investigate speaking mode variations, i.e., discrepancies between audible and silent speech that deteriorate recognition accuracy. We introduce multimode systems that allow seamless switching between audible and silent speech, investigate different measures which quantify speaking mode differences, and present the spectral mapping algorithm, which improves the word error rate (WER) on silent speech by up to 14.3% relative. Our best average silent speech WER is 34.7%, and our best WER on audibly spoken speech is 16.8%.

[1]  Kiyohiro Shikano,et al.  Non-Audible Murmur (NAM) Recognition , 2006, IEICE Trans. Inf. Syst..

[2]  Tanja Schultz,et al.  Towards Speaker-adaptive Speech Recognition based on Surface Electromyography , 2009, BIOSIGNALS.

[3]  Tanja Schultz,et al.  Spatial Artifact Detection for Multi-channel EMG-based Speech Recognition , 2014, BIOSIGNALS.

[4]  R. Stein,et al.  Changes in firing rate of human motor units during linearly changing voluntary contractions , 1973, The Journal of physiology.

[5]  Jason A. Tourville,et al.  Neural mechanisms underlying auditory feedback control of speech , 2008, NeuroImage.

[6]  L. Maier-Hein,et al.  Session independent non-audible speech recognition using surface electromyography , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[7]  Tanja Schultz,et al.  Session-independent EMG-based Speech Recognition , 2011, BIOSIGNALS.

[8]  Florian Metze,et al.  A flexible stream architecture for ASR using articulatory features , 2002, INTERSPEECH.

[9]  Joshua C. Kline,et al.  Decomposition of surface EMG signals. , 2006, Journal of neurophysiology.

[10]  Tanja Schultz,et al.  Towards continuous speech recognition using surface electromyography , 2006, INTERSPEECH.

[11]  J. M. Gilbert,et al.  Silent speech interfaces , 2010, Speech Commun..

[12]  António J. S. Teixeira,et al.  Towards a Silent Speech Interface for Portuguese - Surface Electromyography and the Nasality Challenge , 2012, BIOSIGNALS.

[13]  Tanja Schultz,et al.  Artifact removal algorithm for an EMG-based Silent Speech Interface , 2013, 2013 35th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC).

[14]  Satrajit S. Ghosh,et al.  Neural modeling and imaging of the cortical interactions underlying syllable production , 2006, Brain and Language.

[15]  Ki-Seung Lee,et al.  Prediction of Acoustic Feature Parameters Using Myoelectric Signals , 2010, IEEE Transactions on Biomedical Engineering.

[16]  Herbert Gish,et al.  Understanding and improving speech recognition performance through the use of diagnostic tools , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[17]  Tanja Schultz,et al.  Impact of Different Feedback Mechanisms in EMG-Based Speech Recognition , 2011, INTERSPEECH.

[18]  Javier M. Antelis,et al.  Syllable-based speech recognition using EMG , 2010, 2010 Annual International Conference of the IEEE Engineering in Medicine and Biology.

[19]  P. Cavanagh,et al.  Electromechanical delay in human skeletal muscle under concentric and eccentric contractions , 1979, European Journal of Applied Physiology and Occupational Physiology.

[20]  Tanja Schultz,et al.  Impact of lack of acoustic feedback in EMG-based silent speech recognition , 2010, INTERSPEECH.

[21]  Tanja Schultz,et al.  A Spectral Mapping Method for EMG-based Recognition of Silent Speech , 2010, B-Interface.

[22]  M.K.C. MacMahon International Phonetic Association , 2006 .

[23]  C.E. Stepp,et al.  Neck and Face Surface Electromyography for Prosthetic Voice Control After Total Laryngectomy , 2009, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[24]  Lise Crevier-Buchman,et al.  Silent vs vocalized articulation for a portable ultrasound-based silent speech interface , 2010, INTERSPEECH.

[25]  Tanja Schultz,et al.  Array-based Electromyographic Silent Speech Interface , 2013, BIOSIGNALS.

[26]  Tanja Schultz,et al.  Impact of different speaking modes on EMG-based speech recognition , 2009, INTERSPEECH.

[27]  A. Perry,et al.  Reasons for success or failure in surgical voice restoration after total laryngectomy: an Australian study , 2001, The Journal of Laryngology & Otology.

[28]  M S Morse,et al.  Research summary of a scheme to ascertain the availability of speech information in the myoelectric signals of neck and head muscles using surface electrodes. , 1986, Computers in biology and medicine.

[29]  Geoffrey E. Hinton,et al.  Split and Merge EM Algorithm for Improving Gaussian Mixture Density Estimates , 2000, J. VLSI Signal Process..

[30]  Rupal Patel,et al.  Disordered speech recognition using acoustic and sEMG signals , 2009, INTERSPEECH.

[31]  Tanja Schultz,et al.  Synthesizing speech from electromyography using voice transformation techniques , 2009, INTERSPEECH.

[32]  Tanja Schultz,et al.  Modeling coarticulation in EMG-based continuous speech recognition , 2010, Speech Commun..

[33]  Kazuya Takeda,et al.  Analysis and recognition of whispered speech , 2005, Speech Commun..

[34]  Tanja Schultz,et al.  Investigations on Speaking Mode Discrepancies in EMG-Based Speech Recognition , 2011, INTERSPEECH.

[35]  Simon King,et al.  Articulatory feature recognition using dynamic Bayesian networks , 2007, Comput. Speech Lang..

[36]  Carlo J. De Luca,et al.  Physiology and Mathematics of Myoelectric Signals , 1979 .

[37]  Katrin Kirchhoff,et al.  Robust speech recognition using articulatory information , 1998 .

[38]  Sorin Dusan,et al.  Speech interfaces based upon surface electromyography , 2010, Speech Commun..

[39]  Michael Finke,et al.  Wide context acoustic modeling in read vs. spontaneous speech , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[40]  Carlo J De Luca,et al.  Decomposition of indwelling EMG signals. , 2008, Journal of applied physiology.

[41]  Kim Binsted,et al.  Web Browser Control Using EMG Based Sub Vocal Speech Recognition , 2005, Proceedings of the 38th Annual Hawaii International Conference on System Sciences.

[42]  James T. Heaton,et al.  Signal acquisition and processing techniques for sEMG based silent speech recognition , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[43]  Yong Gu Ji Inferring prosody from facial cues for EMG-based synthesis of silent speech , 2012 .

[44]  Florian Metze,et al.  Analysis of gender normalization using MLP and VTLN features , 2010, INTERSPEECH.

[45]  Noboru Sugie,et al.  A Speech Prosthesis Employing a Speech Synthesizer-Vowel Discrimination from Perioral Muscle Activities and Vowel Production , 1985, IEEE Transactions on Biomedical Engineering.

[46]  Michael Schünke,et al.  PROMETHEUS LernAtlas der Anatomie , 2014 .

[47]  Geoffrey S. Meltzner,et al.  Impact of aberrant acoustic properties on the perception of sound quality in electrolarynx speech. , 2005, Journal of speech, language, and hearing research : JSLHR.

[48]  Frank H. Guenther,et al.  Speech motor control: Acoustic goals, saturation effects, auditory feedback and internal models , 1997, Speech Commun..

[49]  P. Welch The use of fast Fourier transform for the estimation of power spectra: A method based on time averaging over short, modified periodograms , 1967 .

[50]  Raj Reddy,et al.  Large-vocabulary speaker-independent continuous speech recognition: the sphinx system , 1988 .

[51]  Tanja Schultz,et al.  Decision-tree based Analysis of Speaking Mode Discrepancies in EMG-based Speech Recognition , 2012, BIOSIGNALS.

[52]  Tanja Schultz,et al.  Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..

[53]  Roberto Merletti,et al.  Electromyography. Physiology, engineering and non invasive applications , 2005 .

[54]  Lena Maier-Hein,et al.  Articulatory Feature Classification using Surface Electromyography , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[55]  Tanja Schultz,et al.  Continuous Electromyographic Speech Recognition with a Multi-Stream Decoding Architecture , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.