Robust Speaking Face Identification for Video Analysis

We investigate the problem of automatically identifying speaking faces for video analysis using only the visual information. Intuitively, mouth should be first accurately located in each face, but this is extremely challenging due to the complicated condition in video, such as irregular lighting, changing face poses and low resolution etc. Even though we get the accurate mouth location, it's still very hard to align corresponding mouths. However, we demonstrate that high precision can be achieved by aligning mouths through face matching, which needs no accurate mouth location. The principal novelties that we introduce are: (i) proposing a framework for speaking face identification for video analysis; (ii) detecting the change of the aligned mouth through face matching; (iii) introducing a novel descriptor to describe the change of the mouth. Experimental results on videos demonstrated that the proposed approach is efficient and robust for speaking face identification.

[1]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[2]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Jun Wu,et al.  Tsinghua University at TRECVID 2004: Shot Boundary Detection and High-Level Feature Extraction , 2004, TRECVID.

[4]  Xin Huang,et al.  Shot Boundary Detection and High-Level Features Extraction for the TREC Video Evaluation 2003 , 2003, TRECVID.

[5]  James M. Rehg,et al.  Vision-based speaker detection using Bayesian networks , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[6]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[7]  Andrew Zisserman,et al.  Automatic face recognition for film character retrieval in feature-length films , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Trevor Darrell,et al.  Visual speech recognition with loosely synchronized feature streams , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[9]  F. Meyer,et al.  Color image segmentation , 1992 .

[10]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[11]  Yuan Li,et al.  Robust Head Tracking with Particles Based on Multiple Cues Fusion , 2006, ECCV Workshop on HCI.

[12]  Andrew Zisserman,et al.  Identifying individuals in video by combining 'generative' and discriminative head models , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  Andrew Zisserman,et al.  Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video , 2006, BMVC.

[14]  James M. Rehg,et al.  Computer Vision for Human–Machine Interaction: Visual Sensing of Humans for Active Public Interfaces , 1998 .