Robust geometrical-based lip-reading using Hidden Markov models

Lip reading is a process used to recognize speech from the viewed physical movements of the lips. In this paper, we present a new automatic lip-reading system that uses geometrical information extracted from video sequences in the classification of dynamic lip movements and implemented in four variants of Hidden Markov Models. In the recognition of the English digits 0 to 9 as spoken by the subjects available in the CUAVE database, the proposed system is able to produce a word recognition performance of up to 68%, a result better than that obtained using a conventional appearance-based Discrete Cosine Transform technique. The two approaches are also compared when operating under simulated changes in environment conditions that arise from head movements and alterations in image illumination. The performance of the appearance-based approach was adversely affected by such rotational and brightness changes, yet the performance of the geometrical-based method remained consistent, demonstrating its potential to be effective as part of a multimodal speech recognition system for use in noisy environments.

[1]  Kai Feng,et al.  The subspace Gaussian mixture model - A structured model for speech recognition , 2011, Comput. Speech Lang..

[2]  John H. L. Hansen,et al.  Feature Compensation Employing Variational Model Composition for Robust Speech Recognition in In-Vehicle Environment , 2012 .

[3]  Aggelos K. Katsaggelos,et al.  10.8 – Exploiting Visual Information in Automatic Speech Processing , 2005 .

[4]  Jack Sklansky,et al.  Finding the convex hull of a simple polygon , 1982, Pattern Recognit. Lett..

[5]  Kah Phooi Seng,et al.  Audio-Visual Speech Processing for Human Computer Interaction , 2012 .

[6]  Hong Li,et al.  Model-based segmentation and recognition of dynamic gestures in continuous video streams , 2011, Pattern Recognit..

[7]  Mark A. Clements,et al.  Automatic Speechreading with Applications to Human-Computer Interfaces , 2002, EURASIP J. Adv. Signal Process..

[8]  Gary R. Bradski,et al.  Learning OpenCV - computer vision with the OpenCV library: software that sees , 2008 .

[9]  Juergen Luettin,et al.  Audio-Visual Automatic Speech Recognition: An Overview , 2004 .

[10]  Sabri Gurbuz,et al.  Moving-Talker, Speaker-Independent Feature Study, and Baseline Results Using the CUAVE Multimodal Speech Corpus , 2002, EURASIP J. Adv. Signal Process..

[11]  Nikolaos G. Bourbakis,et al.  A survey of skin-color modeling and detection methods , 2007, Pattern Recognit..

[12]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[13]  Steve Young,et al.  The HTK book version 3.4 , 2006 .