On evaluating face tracks in movies

Automatic extraction of face tracks is a key component of systems that analyse people in audio-visual content such as TV programs and movies. Due to the lack of properly annotated content of this type, popular algorithms for extracting face tracks have not been fully assessed in the literature. We introduce and make publicly available a new dataset, based on the full annotation of a feature movie, to help fill this gap. We show in particular that, thanks to this dataset, state-of-art tracking metrics can now be exploited to evaluate face tracks used by, e.g., automatic character naming systems. We conduct such an evaluation on different variants of a novel system that we introduce as a generalization of existing ones.

[1]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[2]  Jing Zhang,et al.  Framework for Performance Evaluation of Face, Text, and Vehicle Detection and Tracking in Video: Data, Metrics, and Protocol , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  N. Nikolaidis,et al.  Video shot detection and condensed representation. a review , 2006, IEEE Signal Processing Magazine.

[4]  Bertrand Chupeau,et al.  A Video Fingerprint Based on Visual Digest and Local Fingerprints , 2006, 2006 International Conference on Image Processing.

[5]  Kevin Smith,et al.  Bayesian methods for visual multi-object tracking with applications to human activity recognition , 2007 .

[6]  Jean-Marc Odobez,et al.  Evaluating Multi-Object Tracking , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[7]  C. De Vleeschouwer,et al.  Robust video hashing based on radial projections of key frames , 2005, IEEE Transactions on Signal Processing.

[8]  Andrew Zisserman,et al.  “Who are you?” - Learning person specific classifiers from video , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Jean-Didier Legat,et al.  RASH: RAdon soft hash algorithm , 2002, 2002 11th European Signal Processing Conference.

[10]  Andrew Zisserman,et al.  Taking the bite out of automated naming of characters in TV video , 2009, Image Vis. Comput..

[11]  Sadaaki Miyamoto,et al.  Semi-supervised agglomerative hierarchical clustering algorithms with pairwise constraints , 2010, International Conference on Fuzzy Systems.

[12]  Deva Ramanan,et al.  Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Cordelia Schmid,et al.  Unsupervised metric learning for face identification in TV video , 2011, 2011 International Conference on Computer Vision.

[14]  Cordelia Schmid,et al.  Human Focused Action Localization in Video , 2010, ECCV Workshops.

[15]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.