Classification of voiceless speech using facial muscle activity and vision based techniques

This paper presents a silent speech recognition technique based on facial muscle activity and video, without evaluating any voice signals. This research examines the use of facial surface electromyogram (SEMG) to identify unvoiced vowels and vision-based technique to classify unvoiced consonants. The moving root mean square (RMS) of SEMG signals of four facial muscles is used to segment the signals and to identify the start and end of a silently spoken vowels. Visual features are extracted from the mouth video of a speaker silently uttering consonants using motion segmentation and image moment techniques. The SEMG features and visual features are classified using feedforward multilayer perceptron (MLP) neural networks. The preliminary results demonstrate that the proposed technique yields high recognition rate for classification of unvoiced vowels using SEMG features. Similarly, promising results are obtained in identification of consonants using visual features. The results demonstrate that the system is easy to train for a new user and suggest that such a system works reliably for voiceless, simple speech based commands for human computer interface when it is trained for a user.

[1]  A. J. Fridlund,et al.  Guidelines for human electromyographic research. , 1986, Psychophysiology.

[2]  Edward H. Adelson,et al.  Shiftable multiscale transforms , 1992, IEEE Trans. Inf. Theory.

[3]  Chalapathy Neti,et al.  Recent advances in the automatic recognition of audiovisual speech , 2003, Proc. IEEE.

[4]  D. Stegeman,et al.  A surface EMG electrode for the simultaneous observation of multiple facial muscles , 2003, Journal of Neuroscience Methods.

[5]  P. S. Dikshit,et al.  Electroglottograph as an additional source of information in isolated word recognition , 1995, Proceedings of the 1995 Fourteenth Southern Biomedical Engineering Conference.

[6]  Dinesh Kant Kumar,et al.  Visual Speech Recognition Method Using Translation, Scale and Rotation Invariant Features , 2006, 2006 IEEE International Conference on Video and Signal Based Surveillance.

[7]  Pierre Philippot,et al.  Facial Reactions to Emotional Facial Expressions: Affect or Cognition? , 1998 .

[8]  Alireza Khotanzad,et al.  Invariant Image Recognition by Zernike Moments , 1990, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Thomas W. Parsons,et al.  Voice and Speech Processing , 1986 .

[10]  Kevin Englehart,et al.  A multi-expert speech recognition system using acoustic and myoelectric signals , 2002, Proceedings of the Second Joint 24th Annual Conference and the Annual Fall Meeting of the Biomedical Engineering Society] [Engineering in Medicine and Biology.

[11]  James W. Davis,et al.  The Recognition of Human Movement Using Temporal Templates , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Roland T. Chin,et al.  On Image Analysis by the Methods of Moments , 1988, IEEE Trans. Pattern Anal. Mach. Intell..

[13]  Kevin Barraclough,et al.  I and i , 2001, BMJ : British Medical Journal.

[14]  Timothy J. Hazen Visual model structures and synchrony constraints for audio-visual speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[16]  H. Devries MUSCLES ALIVE-THEIR FUNCTIONS REVEALED BY ELECTROMYOGRAPHY , 1976 .

[17]  Toshiaki Sugimura,et al.  "Unvoiced speech recognition using EMG - mime speech recognition" , 2003, CHI Extended Abstracts.

[18]  S. Mallat A wavelet tour of signal processing , 1998 .

[19]  D.K. Kumar,et al.  Unspoken Vowel Recognition Using Facial Electromyogram , 2006, 2006 International Conference of the IEEE Engineering in Medicine and Biology Society.

[20]  J. Basmajian Muscles Alive—their functions revealed by electromyography , 1963 .

[21]  A. Holmes,et al.  Speechreading: A Way to Improve Understanding. , 1986 .

[22]  S. Kumar,et al.  EMG based voice recognition , 2004, Proceedings of the 2004 Intelligent Sensors, Sensor Networks and Information Processing Conference, 2004..

[23]  Daniel Jones An outline of English phonetics , 1956 .