Automatic face annotation in TV series by video/script alignment

This paper describes a method for automatically tagging the names to the faces which are collected from uncontrolled TV series videos. The detected faces are firstly partitioned into several clusters. Then we construct a face sequence based on their occurrence order in the video and denote them by cluster labels. It can be assumed that the temporal distribution of the faces in the video roughly follows the temporal distribution of the names in the script. Hence, we propose to annotate the faces by video/script alignment. A global sequence alignment algorithm is employed to find the most probable faces in the face sequence matching to the names in the name sequence. The novelty lies in that we consider the temporal order relationship of the faces and names over the whole video and directly align two heterogeneous sequences. Experiments on real-world videos have demonstrated the effectiveness and efficiency of the proposed method.

[1]  Yifan Zhang,et al.  Video face naming using global sequence alignment , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[2]  Qiang Ji,et al.  Constrained Clustering and Its Application to Face Clustering in Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Andrew Zisserman,et al.  Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video , 2006, BMVC.

[4]  Changsheng Xu,et al.  TVParser: An automatic TV video parsing method , 2011, CVPR 2011.

[5]  Andrew Zisserman,et al.  "Who are you?" - Learning person specific classifiers from video , 2009, CVPR.

[6]  Rainer Stiefelhagen,et al.  “Knock! Knock! Who is it?” probabilistic person identification in TV-series , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Xiaogang Wang,et al.  Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Zhiwu Lu,et al.  Constrained Spectral Clustering via Exhaustive and Efficient Constraint Propagation , 2010, ECCV.

[9]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[10]  Changsheng Xu,et al.  Character Identification in Feature-Length Films Using Global Face-Name Matching , 2009, IEEE Transactions on Multimedia.

[11]  C. V. Jawahar,et al.  Subtitle-free Movie to Script Alignment , 2009, BMVC.

[12]  Cordelia Schmid,et al.  Unsupervised metric learning for face identification in TV video , 2011, 2011 International Conference on Computer Vision.

[13]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[14]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[15]  Rainer Stiefelhagen,et al.  Semi-supervised Learning with Constraints for Person Identification in Multimedia Data , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Changsheng Xu,et al.  Robust Face-Name Graph Matching for Movie Character Identification , 2012, IEEE Transactions on Multimedia.