Towards a practical silent speech recognition system

Our recent efforts towards developing a practical surface electromyography (sEMG) based silent speech recognition interface have resulted in significant advances in the hardware, software and algorithmic components of the system. In this paper, we report our algorithmic progress, specifically: sEMG feature extraction parameter optimization, advances in sEMG acoustic modeling, and sEMG sensor set reduction. The key findings are: 1) the gold-standard parameters for acoustic speech feature extraction are far from optimum for sEMG parameterization, 2) advances in state-of-the-art speech modelling can be leveraged to significantly enhance the continuous sEMG silent speech recognition accuracy, and 3) the number of sEMG sensors can be reduced by half with little impact on the final recognition accuracy, and the optimum sensor subset can be selected efficiently based on basic monophone HMM modeling.

[1]  D. F. Lovely,et al.  Myo-electric signals to augment speech recognition , 2001, Medical and Biological Engineering and Computing.

[2]  Nancy Chen,et al.  Speech recognition for vocalized and subvocal modes of production using surface EMG signals from the neck and face , 2008, INTERSPEECH.

[3]  Daniel Povey,et al.  The Kaldi Speech Recognition Toolkit , 2011 .

[4]  Gérard Chollet,et al.  Acquisition of Ultrasound, Video and Acoustic Speech Data for a Silent-Speech Interface Application , 2008 .

[5]  James T. Heaton,et al.  Sensor subset selection for surface electromyograpy based speech recognition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Tanja Schultz,et al.  Modeling coarticulation in EMG-based continuous speech recognition , 2010, Speech Commun..

[7]  James T. Heaton,et al.  Signal processing advances for the MUTE sEMG-based silent speech recognition system , 2012, MILCOM 2012 - 2012 IEEE Military Communications Conference.

[8]  Ki-Seung Lee,et al.  EMG-Based Speech Recognition Using Hidden Markov Models With Global Control Variables , 2008, IEEE Transactions on Biomedical Engineering.

[9]  Tanja Schultz,et al.  Sub-word unit based non-audible speech recognition using surface electromyography , 2006, INTERSPEECH.

[10]  Andreas G. Andreou,et al.  Heteroscedastic discriminant analysis and reduced rank HMMs for improved speech recognition , 1998, Speech Commun..

[11]  James T. Heaton,et al.  Signal acquisition and processing techniques for sEMG based silent speech recognition , 2011, 2011 Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[12]  Lena Maier-Hein,et al.  Articulatory Feature Classification using Surface Electromyography , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[13]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[14]  Tanja Schultz,et al.  Session-independent EMG-based Speech Recognition , 2011, BIOSIGNALS.

[15]  Rupal Patel,et al.  Disordered speech recognition using acoustic and sEMG signals , 2009, INTERSPEECH.

[16]  Kai Feng,et al.  The subspace Gaussian mixture model - A structured model for speech recognition , 2011, Comput. Speech Lang..

[17]  Ramesh A. Gopinath,et al.  Maximum likelihood modeling with Gaussian distributions for classification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[18]  Charles Jorgensen,et al.  Small Vocabulary Recognition Using Surface Electromyography in an Acoustically Harsh Environment , 2005 .

[19]  L. Maier-Hein,et al.  Session independent non-audible speech recognition using surface electromyography , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..