Automatic lip localization under face illumination with shadow consideration

Lip-reading has potential attractive applications in information security, speech recognition, secret communication and so forth. To build an automatic lip-reading system, one key issue is how to locate the lip region, particularly under the changing illumination condition. Empirical studies have shown that the recognition rate of a lip-reading system greatly relies on the accuracy of the lip localization. Unfortunately, to the best of our knowledge, lip localization under face illumination with shadow consideration has not been well solved yet. Moreover, this problem is also one of the major obstacles to keeping an automatic lip-reading system from the practical applications. This paper therefore concentrates on this problem and proposes a new approach to obtain the minimum enclosing rectangle surround of a mouth automatically based upon the transformed gray-level image. In this approach, a pre-processing is firstly made to reduce the interference caused by shadow and enhance the boundary region of lip, through which the left and right mouth corners are estimated. Then, by building a binary sequence based on the gray-level values along with the vertical midline of mouth, the top and bottom crucial points can be estimated. Experiments show the promising result of the proposed approach in comparison with the existing methods.

[1]  Walid Mahdi,et al.  Colour and Geometric based Model for Lip Localisation: Application for Lip-reading System , 2007, 14th International Conference on Image Analysis and Processing (ICIAP 2007).

[2]  Juergen Luettin,et al.  Audio-Visual Speech Modeling for Continuous Speech Recognition , 2000, IEEE Trans. Multim..

[3]  Alex Pentland,et al.  A three-dimensional model of human lip motions trained from video , 1997, Proceedings IEEE Nonrigid and Articulated Motion Workshop.

[4]  Hua Ouyang,et al.  A new lip feature representation method for video-based bimodal authentication , 2006 .

[5]  Alex Pentland,et al.  3D modeling and tracking of human lip motions , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[6]  Yasuyuki Nakata,et al.  Lipreading method using color extraction method and eigenspace technique , 2004, Systems and Computers in Japan.

[7]  Lorenzo Torresani,et al.  2D Deformable Models for Visual Speech Analysis , 1996 .

[8]  Nenghai Yu,et al.  Adaptive color quantization based on perceptive edge protection , 2003, Pattern Recognit. Lett..

[9]  Xuelong Li,et al.  Cast shadow detection in video segmentation , 2005, Pattern Recognit. Lett..

[10]  Tsuhan Chen,et al.  Audio-visual integration in multimodal communication , 1998, Proc. IEEE.

[11]  Gerasimos Potamianos,et al.  An image transform approach for HMM based automatic lipreading , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[12]  Juergen Luettin,et al.  Speechreading using shape and intensity information , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[13]  Alice Caplier Lip detection and tracking , 2001, Proceedings 11th International Conference on Image Analysis and Processing.

[14]  Barney Dalton,et al.  Automatic Speechreading using dynamic contours , 1996 .

[15]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[16]  Eric D. Petajan Automatic lipreading to enhance speech recognition , 1984 .

[17]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[18]  Timothy F. Cootes,et al.  Extraction of Visual Features for Lipreading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Alex Pentland,et al.  Automatic lipreading by optical-flow analysis , 1989 .

[20]  Eric David Petajan,et al.  Automatic Lipreading to Enhance Speech Recognition (Speech Reading) , 1984 .

[21]  Juergen Luettin,et al.  Speechreading using Probabilistic Models , 1997, Comput. Vis. Image Underst..

[22]  M. L. Wolbarsht,et al.  NATO Advanced Study Institute. , 1986, IEEE transactions on medical imaging.

[23]  Alex Pentland,et al.  3D Modeling of Human Lip Motion , 1998, ICCV.

[24]  Mubarak Shah,et al.  Motion-based recognition a survey , 1995, Image Vis. Comput..

[25]  Xuelong Li,et al.  Insignificant shadow detection for video segmentation , 2005, IEEE Transactions on Circuits and Systems for Video Technology.

[26]  Chalapathy Neti,et al.  Audio-visual speech recognition in challenging environments , 2003, INTERSPEECH.

[27]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[28]  Alan Wee-Chung Liew,et al.  Segmentation of color lip images by spatial fuzzy clustering , 2003, IEEE Trans. Fuzzy Syst..

[29]  Peter L. Silsbee,et al.  A multiple deformable template approach for visual speech recognition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[30]  Juergen Luettin,et al.  Audio-Visual Automatic Speech Recognition: An Overview , 2004 .

[31]  Abdul Rauf Baig,et al.  Image sequence analysis using a spatio-temporal coding for automatic lipreading , 1999, Proceedings 10th International Conference on Image Analysis and Processing.