Face Retrieval in Large-Scale News Video Datasets

Face retrieval in news video has been identified as a challenging task due to the huge variations in the visual appearance of the human face. Although several approaches have been proposed to deal with this problem, their extremely high computational cost limits their scalability to large-scale video datasets that may contain millions of faces of hundreds of characters. In this paper, we introduce approaches for face retrieval that are scalable to such datasets while maintaining competitive performances with state-of-the-art approaches. To utilize the variability of face appearances in video, we use a set of face images called face-track to represent the appearance of a character in a video shot. Our first proposal is an approach for extracting face-tracks. We use a point tracker to explore the connections between detected faces belonging to the same character and then group them into one face-track. We present techniques to make the approach robust against common problems caused by flash lights, partial occlusions, and scattered appearances of characters in news videos. In the second proposal, we introduce an efficient approach to match face-tracks for retrieval. Instead of using all the faces in the face-tracks to compute their similarity, our approach obtains a representative face for each facetrack. The representative face is computed from faces that are sampled from the original face-track. As a result, we significantly reduce the computational cost of face-track matching while taking into account the variability of faces in face-tracks to achieve high matching accuracy. Experiments are conducted on two face-track datasets extracted from real-world news videos, of such scales that have never been considered in the literature. One dataset contains 1,497 face-tracks of 41 characters extracted from 370 hours of TRECVID videos. The other dataset provides 5,567 face-tracks of 111 characters observed from a television news program (NHK News 7) over 11 years. We make both datasets publically accessible by the research community. The experimental results show that our proposed approaches achieved a remarkable balance between accuracy and efficiency. key words: face-track extraction, face-track matching, large-scale, news video

[1]  Erica Klarreich,et al.  Hello, my name is… , 2014, CACM.

[2]  John R. Kender,et al.  Selecting the best faces to index presentation videos , 2011, MM '11.

[3]  Tal Hassner,et al.  Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.

[4]  V. Kshirsagar,et al.  Face recognition using Eigenfaces , 2011, 2011 3rd International Conference on Computer Research and Development.

[5]  Duy-Dinh Le,et al.  An efficient method for face retrieval from large video datasets , 2010, CIVR '10.

[6]  Hakan Cevikalp,et al.  Face recognition based on image sets , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[7]  Caifeng Shan,et al.  Face Recognition and Retrieval in Video , 2010, Video Search and Mining.

[8]  Andrew Zisserman,et al.  Taking the bite out of automated naming of characters in TV video , 2009, Image Vis. Comput..

[9]  Duy-Dinh Le,et al.  Robust Face Track Finding in Video Using Tracked Points , 2008, 2008 IEEE International Conference on Signal Image Technology and Internet Based Systems.

[10]  Wen Gao,et al.  Manifold-Manifold Distance with application to face recognition based on image set , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Sham M. Kakade,et al.  Leveraging archival video for building face datasets , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Dit-Yan Yeung,et al.  Locally Linear Models on Face Appearance Manifolds with Application to Dual-Subspace Based Classification , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Andrew Zisserman,et al.  Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video , 2006, BMVC.

[14]  Andrew Zisserman,et al.  Object Level Grouping for Video Shots , 2004, International Journal of Computer Vision.

[15]  David J. Kriegman,et al.  Visual tracking and recognition using probabilistic appearance manifolds , 2005, Comput. Vis. Image Underst..

[16]  Andrew Zisserman,et al.  Person Spotting: Video Shot Retrieval for Face Sets , 2005, CIVR.

[17]  Matti Pietikäinen,et al.  From still image to video-based face recognition: an experimental analysis , 2004, Sixth IEEE International Conference on Automatic Face and Gesture Recognition, 2004. Proceedings..

[18]  Shaogang Gong,et al.  Constructing Facial Identity Surfaces for Recognition , 2003, International Journal of Computer Vision.

[19]  Lior Wolf,et al.  Learning over Sets using Kernel Principal Angles , 2003, J. Mach. Learn. Res..

[20]  Tsuhan Chen,et al.  Video-based face recognition using adaptive hidden Markov models , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[21]  David J. Kriegman,et al.  Video-based face recognition using probabilistic appearance manifolds , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[22]  Osamu Yamaguchi,et al.  Face Recognition Using Multi-viewpoint Patterns for Robot Vision , 2003, ISRR.

[23]  Rama Chellappa,et al.  Probabilistic recognition of human faces from video , 2002, Proceedings. International Conference on Image Processing.

[24]  Trevor Darrell,et al.  Face Recognition from Long-Term Observations , 2002, ECCV.

[25]  Shaogang Gong,et al.  Video-based online face recognition using identity surfaces , 2001, Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems.

[26]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[27]  Jianbo Shi,et al.  The CMU Motion of Body ( MoBo ) , 2001 .

[28]  Ralph Gross,et al.  The CMU Motion of Body (MoBo) Database , 2001 .

[29]  Shin'ichi Satoh,et al.  Comparative evaluation of face sequence matching for content-based video access , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[30]  Timothy F. Cootes,et al.  Improving identification performance by integrating evidence from sequences , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[31]  Ken-ichi Maeda,et al.  Face recognition using temporal image sequence , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[32]  David J. Kriegman,et al.  Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection , 1996, ECCV.

[33]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Alex Pentland,et al.  View-based and modular eigenspaces for face recognition , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.