Lip Localization on Multi-View Faces in Video

In this paper, a real time and robust multi-view lip localization in video frames is proposed. The objective of this research is to exactly localize lip surrounding box with sub-pixel accuracy in low quality video frames. This is achieved by multi-view face detection in video frames, lip ROI extraction, color space transformation, vertical edge feature detection and lip localization by a feature-spatial profile analysis on a joint color-edge plane. One possible application of talking face detection in video is used to evaluate our work. Experimental results on TV programs and movies demonstrate that the proposed approach is efficient and robust in lip localization.

[1]  Jeffrey F. Cohn,et al.  Robust Lip Tracking by Combining Shape, Color and Motion , 2007 .

[2]  Alan Wee-Chung Liew,et al.  Segmentation of color lip images by spatial fuzzy clustering , 2003, IEEE Trans. Fuzzy Syst..

[3]  Alice Caplier,et al.  New color transformation for lips segmentation , 2001, 2001 IEEE Fourth Workshop on Multimedia Signal Processing (Cat. No.01TH8564).

[4]  Jean-Philippe Thiran,et al.  Automatic Extraction of Geometric Lip Features with Application to Multi-Modal Speaker Identification , 2006, 2006 IEEE International Conference on Multimedia and Expo.

[5]  Ioannis Pitas,et al.  Content-based video parsing and indexing based on audio-visual interaction , 2001, IEEE Trans. Circuits Syst. Video Technol..

[6]  Andrew Zisserman,et al.  Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video , 2006, BMVC.

[7]  Russell M. Mersereau,et al.  Lip feature extraction towards an automatic speechreading system , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[8]  Li Zhang,et al.  Robust face alignment based on local texture classifiers , 2005, IEEE International Conference on Image Processing 2005.

[9]  H.J. Trussell,et al.  Color image processing [basics and special issue overview] , 2005, IEEE Signal Processing Magazine.

[10]  Maja Pantic,et al.  A hybrid approach to mouth features detection , 2001, 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236).

[11]  Trevor Darrell,et al.  Visual speech recognition with loosely synchronized feature streams , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[12]  Alice Caplier,et al.  Accurate and quasi-automatic lip tracking , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[13]  Franck Luthon,et al.  Nonlinear color space and spatiotemporal MRF for hierarchical segmentation of face features in video , 2004, IEEE Transactions on Image Processing.

[14]  Alice Caplier,et al.  A parametric model for realistic lip segmentation , 2002, 7th International Conference on Control, Automation, Robotics and Vision, 2002. ICARCV 2002..

[15]  Yuan Li,et al.  Vector boosting for rotation invariant multi-view face detection , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.