A Comparative Study of Different EMG Features for Acoustics-to-EMG Mapping

Electromyography (EMG) signals have been extensively used to capture facial muscle movements while speaking since they are one of the most closely related bio-signals generated during speech production. In this work, we focus on speech acoustics to EMG prediction. We present a comparative study of ten different EMG signal-based features including Time Domain (TD) features existing in the literature to examine their effectiveness in speech acoustics to EMG inverse (AEI) mapping. We propose a novel feature based on the Hilbert envelope of the filtered EMG signal. The raw EMG signal is reconstructed from these features as well. For the AEI mapping, we use a bi-directional long short-term memory (BLSTM) network in a session-dependent manner. To estimate the raw EMG signal from the EMG features, we use a CNN-BLSTM model comprising of a convolution neural network (CNN) followed by BLSTM layers. AEI mapping performance using the BLSTM network reveals that the Hilbert envelope based feature is predicted from speech with the highest accuracy, among all the features. Therefore, it could be the most representative feature of the underlying muscle activation during speech production. The proposed Hilbert envelope feature, when used together with the existing TD features, improves the raw EMG signal reconstruction performance compared to using the TD features alone.

[1]  Slawomir J. Nasuto,et al.  The application of the Hilbert spectrum to the analysis of electromyographic signals , 2008, Inf. Sci..

[2]  Satoshi Imai,et al.  Cepstral analysis synthesis on the mel frequency scale , 1983, ICASSP.

[3]  N. Huang,et al.  The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis , 1998, Proceedings of the Royal Society of London. Series A: Mathematical, Physical and Engineering Sciences.

[4]  Prasanta Kumar Ghosh,et al.  Low Resource Acoustic-to-articulatory Inversion Using Bi-directional Long Short Term Memory , 2018, INTERSPEECH.

[5]  Matthias Janke,et al.  EMG-to-Speech: Direct Generation of Speech From Facial Electromyographic Signals , 2017, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[6]  Emilio Bizzi,et al.  Combinations of muscle synergies in the construction of a natural motor behavior , 2003, Nature Neuroscience.

[7]  Toshio Fukuda,et al.  Neuro-fuzzy control of a robotic exoskeleton with EMG signals , 2004, IEEE Transactions on Fuzzy Systems.

[8]  Xiangyang Zhu,et al.  A Multichannel Surface EMG System for Hand Motion Recognition , 2015, Int. J. Humanoid Robotics.

[9]  Pornchai Phukpattaranont,et al.  A Novel Feature Extraction for Robust EMG Pattern Recognition , 2009, ArXiv.

[10]  Tanja Schultz,et al.  Towards continuous speech recognition using surface electromyography , 2006, INTERSPEECH.

[11]  Gerhard Nahler,et al.  Pearson Correlation Coefficient , 2020, Definitions.

[12]  Tanja Schultz,et al.  The EMG-UKA corpus for electromyographic speech processing , 2014, INTERSPEECH.

[13]  Tanja Schultz,et al.  Improving Fundamental Frequency Generation in EMG-to-Speech Conversion Using a Quantization Approach , 2019, 2019 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU).

[14]  Pornchai Phukpattaranont,et al.  EMG AMPLITUDE ESTIMATORS BASED ON PROBABILITY DISTRIBUTION FOR MUSCLE–COMPUTER INTERFACE , 2013 .

[15]  Tanja Schultz,et al.  Modeling coarticulation in EMG-based continuous speech recognition , 2010, Speech Commun..

[16]  L. Lin,et al.  A concordance correlation coefficient to evaluate reproducibility. , 1989, Biometrics.

[17]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[18]  D. D. Lee,et al.  Sub auditory speech recognition based on EMG signals , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[19]  Sorin Dusan,et al.  Speech interfaces based upon surface electromyography , 2010, Speech Commun..

[20]  A. Belli,et al.  Influence of fatigue on EMG/force ratio and cocontraction in cycling. , 2000, Medicine and science in sports and exercise.

[21]  Björn W. Schuller,et al.  Toward Silent Paralinguistics: Speech-to-EMG - Retrieving Articulatory Muscle Activity from Speech , 2020, INTERSPEECH.

[22]  L. Maier-Hein,et al.  Session independent non-audible speech recognition using surface electromyography , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[23]  Dario Farina,et al.  EMG-Based Characterization of Pathological Tremor Using the Iterated Hilbert Transform , 2011, IEEE Transactions on Biomedical Engineering.

[24]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..