Prediction of Acoustic Feature Parameters Using Myoelectric Signals

It is well-known that a clear relationship exists between human voices and myoelectric signals (MESs) from the area of the speaker's mouth. In this study, we utilized this information to implement a speech synthesis scheme in which MES alone was used to predict the parameters characterizing the vocal-tract transfer function of specific speech signals. Several feature parameters derived from MES were investigated to find the optimal feature for maximization of the mutual information between the acoustic and the MES features. After the optimal feature was determined, an estimation rule for the acoustic parameters was proposed, based on a minimum mean square error (MMSE) criterion. In a preliminary study, 60 isolated words were used for both objective and subjective evaluations. The results showed that the average Euclidean distance between the original and predicted acoustic parameters was reduced by about 30% compared with the average Euclidean distance of the original parameters. The intelligibility of the synthesized speech signals using the predicted features was also evaluated. A word-level identification ratio of 65.5% and a syllable-level identification ratio of 73% were obtained through a listening test.

[1]  Man-Wai Mak,et al.  Speech synthesis from surface electromyogram signal , 2005, Proceedings of the Fifth IEEE International Symposium on Signal Processing and Information Technology, 2005..

[2]  M.G. Bellanger,et al.  Digital processing of speech signals , 1980, Proceedings of the IEEE.

[3]  B. Hudgins,et al.  Hidden Markov model classification of myoelectric signals in speech , 2001, 2001 Conference Proceedings of the 23rd Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[4]  D. F. Lovely,et al.  Myo-electric signals to augment speech recognition , 2001, Medical and Biological Engineering and Computing.

[5]  Peter Jax,et al.  Feature selection for improved bandwidth extension of speech signals , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  K. Paliwal,et al.  Quantization of LPC Parameters , 2022 .

[7]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[8]  John N. Gowdy,et al.  Myoelectric signals for multimodal speech recognition , 2005, INTERSPEECH.

[9]  D. D. Lee,et al.  Sub auditory speech recognition based on EMG signals , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[10]  Ki-Seung Lee,et al.  EMG-Based Speech Recognition Using Hidden Markov Models With Global Control Variables , 2008, IEEE Transactions on Biomedical Engineering.

[11]  Adrian D. C. Chan,et al.  Multiexpert automatic speech recognition using acoustic and myoelectric signals , 2006, IEEE Transactions on Biomedical Engineering.

[12]  G. White,et al.  Speech recognition experiments with linear predication, bandpass filtering, and dynamic programming , 1976 .

[13]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[14]  Man-Wai Mak,et al.  Frame-Based SEMG-to-Speech Conversion , 2006, 2006 49th IEEE International Midwest Symposium on Circuits and Systems.

[15]  H. Manabe,et al.  Multi-stream HMM for EMG-based speech recognition , 2004, The 26th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[16]  M. Melamed Detection , 2021, SETI: Astronomy as a Contact Sport.

[17]  Lee Ki-Seung HMM-Based Automatic Speech Recognition using EMG Signal , 2006 .

[18]  Garrett B. Stanley,et al.  Design and implementation of a hands-free electrolarynx device controlled by neck strap muscle electromyographic activity , 2004, IEEE Transactions on Biomedical Engineering.

[19]  R. Colombo,et al.  Multiparametric analysis of speech production mechanisms , 1994, IEEE Engineering in Medicine and Biology Magazine.

[20]  Peter Jax,et al.  An upper bound on the quality of artificial bandwidth extension of narrowband speech signals , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[21]  A. Wilgus,et al.  High quality time-scale modification for speech , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  M. McNeil Clinical Management of Sensorimotor Speech Disorders , 2008 .

[23]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[24]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[25]  Ki-Seung Lee,et al.  SNR-Adaptive Stream Weighting for Audio-MES ASR , 2008, IEEE Transactions on Biomedical Engineering.

[26]  Erik J. Scheme,et al.  Myoelectric Signal Classification for Phoneme-Based Speech Recognition , 2007, IEEE Transactions on Biomedical Engineering.

[27]  S. Kumar,et al.  EMG based voice recognition , 2004, Proceedings of the 2004 Intelligent Sensors, Sensor Networks and Information Processing Conference, 2004..

[28]  Ning Bi,et al.  Application of speech conversion to alaryngeal speech enhancement , 1997, IEEE Trans. Speech Audio Process..

[29]  W. Bastiaan Kleijn,et al.  Gaussian mixture model based mutual information estimation between frequency bands in speech , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[30]  M S Morse,et al.  Research summary of a scheme to ascertain the availability of speech information in the myoelectric signals of neck and head muscles using surface electrodes. , 1986, Computers in biology and medicine.

[31]  W. Marsden I and J , 2012 .