Lip reading using optical flow and support vector machines

This paper presents a lip reading technique to classify the discrete utterances without evaluating the acoustic signals. The reported technique analysis the video data of lip motions by computing the optical flow (OF). The statistical properties of the vertical OF component were used to form the feature vectors for training the support vector machines (SVM) classifier. The impact of the variation in speed/velocity of speaking on the performance of the system was minimized by removing the zero energy frames and normalizing the number of frames by interpolation. The resulting system is an efficient visual viseme classifier with high accuracy (95.9%), specificity (98.1%) and sensitivity (66.4%). The results of the experiments demonstrate the developed technique is insensitive to inter speaker variations.

[1]  Michael J. Black,et al.  The Robust Estimation of Multiple Motions: Parametric and Piecewise-Smooth Flow Fields , 1996, Comput. Vis. Image Underst..

[2]  Dinesh Kant Kumar,et al.  Visual Speech Recognition Using Dynamic Features and Support Vector Machines , 2008, Int. J. Image Graph..

[3]  Irena Koprinska,et al.  Temporal video segmentation: A survey , 2001, Signal Process. Image Commun..

[4]  Timothy J. Hazen Visual model structures and synchrony constraints for audio-visual speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Gerasimos Potamianos,et al.  Lipreading Using Profile Versus Frontal Views , 2006, 2006 IEEE Workshop on Multimedia Signal Processing.

[6]  Marco Saerens,et al.  COMPLEMENTARY CUES FOR SPEECH RECOGNITION , 2000 .

[7]  Berthold K. P. Horn,et al.  Determining Optical Flow , 1981, Other Conferences.

[8]  Timothy F. Cootes,et al.  Extraction of Visual Features for Lipreading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  J. Weickert,et al.  Lucas/Kanade meets Horn/Schunck: combining local and global optic flow methods , 2005 .

[10]  Gerasimos Potamianos,et al.  An image transform approach for HMM based automatic lipreading , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[11]  H.P. Graf,et al.  Lip synchronization using speech-assisted video processing , 1995, IEEE Signal Processing Letters.

[12]  S. Kumar,et al.  EMG based voice recognition , 2004, Proceedings of the 2004 Intelligent Sensors, Sensor Networks and Information Processing Conference, 2004..

[13]  Michael J. Black,et al.  Learning Optical Flow , 2008, ECCV.

[14]  Takaaki Hasegawa,et al.  The Image Input Microphone - A New Nonacoustic Speech Communication System by Media Conversion from Oral Motion Images to Speech , 1995, IEEE J. Sel. Areas Commun..

[15]  Alejandro F. Frangi,et al.  Lip reading for robust speech recognition on embedded devices , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[16]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[17]  Koji Iwano Bimodal speech recognition using lip movement measured by optical flow analysis , 2001 .

[18]  William M. Campbell,et al.  Support vector machines for speaker verification and identification , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[19]  Eric D. Petajan Automatic lipreading to enhance speech recognition , 1984 .

[20]  Chalapathy Neti,et al.  Recent advances in the automatic recognition of audiovisual speech , 2003, Proc. IEEE.

[21]  Alex Pentland,et al.  Automatic lipreading by optical-flow analysis , 1989 .