Lip contour segmentation and tracking compliant with lip-reading application constraints

We propose to use both active contours and parametric models for lip contour extraction and tracking. In the first image, jumping snakes are used to detect outer and inner contour key points. These points initialize a lip parametric model composed of several cubic curves that are appropriate to the mouth deformations. According to a combined luminance and chrominance gradient, the initial model is optimized and precisely locked onto the lip contours. On subsequent images, the segmentation is based on the mouth bounding box and key point tracking. Quantitative and qualitative evaluations show the effectiveness of the algorithm for lip-reading applications.

[1]  Jeffrey F. Cohn,et al.  Robust Lip Tracking by Combining Shape, Color and Motion , 2007 .

[2]  J. Cohen,et al.  Color Science: Concepts and Methods, Quantitative Data and Formulas , 1968 .

[3]  Kazunori Sugahara,et al.  Vowel recognition according to lip shapes by using neural network , 1998, 1998 IEEE International Joint Conference on Neural Networks Proceedings. IEEE World Congress on Computational Intelligence (Cat. No.98CH36227).

[4]  Alan L. Yuille,et al.  Feature extraction from faces using deformable templates , 2004, International Journal of Computer Vision.

[5]  W. H. Sumby,et al.  Visual contribution to speech intelligibility in noise , 1954 .

[6]  Shu Hung Leung,et al.  A real-time automatic lipreading system , 2004, 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512).

[7]  James Wc American association of mental deficiency presents panel on training the mentally retarded deaf. , 1967 .

[8]  Christophe Garcia,et al.  Convolutional face finder: a neural architecture for fast and robust face detection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  W. H. Sumby,et al.  Erratum: Visual Contribution to Speech Intelligibility in Noise [J. Acoust. Soc. Am. 26, 212 (1954)] , 1954 .

[10]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[11]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[12]  M. C. Jones Cued speech. , 1992, ASHA.

[13]  Marc Chaumont,et al.  Liptracking and Mpeg4 Animation With Feedback Control , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[14]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[15]  Liang Zhang,et al.  Estimation of the mouth features using deformable templates , 1997, Proceedings of International Conference on Image Processing.

[16]  Alan Wee-Chung Liew,et al.  Lip contour extraction using a deformable model , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[17]  Aggelos K. Katsaggelos,et al.  Lip tracking for MPEG-4 facial animation , 2002, Proceedings. Fourth IEEE International Conference on Multimodal Interfaces.

[18]  Aleix M. Martinez,et al.  The AR face database , 1998 .

[19]  Renaud Seguier,et al.  Genetic Snakes: Application on Lipreading , 2003, ICANNGA.

[20]  T. Kawamura,et al.  Vowel Recognition System by Lip-Reading Method Using Active Contour Models and its Hardware Realization , 2006, 2006 SICE-ICASE International Joint Conference.

[21]  Alice Caplier,et al.  Illumination-robust face recognition using retina modeling , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[22]  Alice Caplier,et al.  Accurate and quasi-automatic lip tracking , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[23]  Takeo Kanade,et al.  A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[24]  L. Zhang Estimation of the Mouth Features Using Deformable Template Matching , 1997 .

[25]  Franck Luthon,et al.  Nonlinear color space and spatiotemporal MRF for hierarchical segmentation of face features in video , 2004, IEEE Transactions on Image Processing.

[26]  Maja Pantic,et al.  A hybrid approach to mouth features detection , 2001, 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236).

[27]  Li Liu,et al.  Lip localization and performance evaluation , 2007, 2007 International Conference on Machine Vision.

[28]  Jon Barker,et al.  An audio-visual corpus for speech perception and automatic speech recognition. , 2006, The Journal of the Acoustical Society of America.

[29]  Patrice Delmas,et al.  Towards robust lip tracking , 2002, Object recognition supported by user interaction for service robots.

[30]  K. K. Neely Effect of Visual Factors on the Intelligibility of Speech , 1956 .

[31]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Alice Caplier,et al.  Inner Lip Segmentation by Combining Active Contours and Parametric Models , 2008, VISAPP.

[33]  He-Jiao Huang,et al.  An Inner Contour Based Lip Moving Feature Extraction Method for Chinese Speech , 2006, 2006 International Conference on Machine Learning and Cybernetics.

[34]  Denis Beautemps,et al.  A HMM recognition of consonant-vowel syllables from lip contours: the cued speech case , 2007, INTERSPEECH.

[35]  A. Caplier,et al.  Automatic and Accurate Lip Tracking , 2003 .

[36]  Alice Caplier,et al.  Inner and outer lip contour tracking using cubic curve parametric models , 2009, 2009 16th IEEE International Conference on Image Processing (ICIP).

[37]  Asif A. Ghazanfar,et al.  The Natural Statistics of Audiovisual Speech , 2009, PLoS Comput. Biol..

[38]  Walid Mahdi,et al.  Automatic Hybrid Approach for Lip POI Localization : Application for Lip-reading System , 2007 .

[39]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.