Hierarchical Tree Representation Based Face Clustering for Video Retrieval

We present a video as a set of people, each person is a sequence of faces clustered by proposed hierarchical tree representation with the purpose of finding all the occurrences of a person in the video without any help of textual information. In the proposed method, faces in a video are detected and tracked to be face-tracks at first, and each face-track is associated with one person. By leveraging temporal constrains, face-tracks that depict the same person in a video are connected. Then we build undirected graphs for a video, and extend discriminative histogram intersection metric learning to generate semantic distances for cutting undirected graphs to be face clusters without predefining the number of clusters. When searching for videos containing the person of query, it is effective to compare faces of query video with sets of people summarized from videos in the dataset. Experimental results show that the proposed face clustering can improve the mean Average Precision of video retrieval and decrease the query time compared to several state-of-the-art approaches.

[1]  Andrew Zisserman,et al.  Person Spotting: Video Shot Retrieval for Face Sets , 2005, CIVR.

[2]  Cordelia Schmid,et al.  Unsupervised metric learning for face identification in TV video , 2011, 2011 International Conference on Computer Vision.

[3]  Rainer Stiefelhagen,et al.  “Knock! Knock! Who is it?” probabilistic person identification in TV-series , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Changsheng Xu,et al.  Character-based movie summarization , 2010, ACM Multimedia.

[5]  Ivan Laptev,et al.  Semi-supervised Learning of Facial Attributes in Video , 2010, ECCV Workshops.

[6]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[7]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Stefan Roth,et al.  People-tracking-by-detection and people-detection-by-tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Qiang Ji,et al.  Constrained Clustering and Its Application to Face Clustering in Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Sei-ichiro Kamata,et al.  Efficiently Finding Individuals from Video Dataset , 2012, IEICE Trans. Inf. Syst..

[12]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Cordelia Schmid,et al.  Is that you? Metric learning approaches for face identification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[14]  Dong Xu,et al.  Weighted Block-Sparse Low Rank Representation for Face Clustering in Videos , 2014, ECCV.

[15]  Shengyong Chen,et al.  Discriminative Histogram Intersection Metric Learning and Its Applications , 2017, Journal of Computer Science and Technology.

[16]  Kaizhu Huang,et al.  GSML: A Unified Framework for Sparse Metric Learning , 2009, 2009 Ninth IEEE International Conference on Data Mining.

[17]  Duy-Dinh Le,et al.  An efficient method for face retrieval from large video datasets , 2010, CIVR '10.

[18]  Robin Sibson,et al.  SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method , 1973, Comput. J..

[19]  Changsheng Xu,et al.  Character Identification in Feature-Length Films Using Global Face-Name Matching , 2009, IEEE Transactions on Multimedia.