论文信息 - Character Identification in TV-series via Non-local Cost Aggregation

Character Identification in TV-series via Non-local Cost Aggregation

We propose a non-local cost aggregation algorithm to recognize the identity of face and person tracks in a TV-series. In our approach, the fundamental element for identification is a track node, which is built on top of face and person tracks. Track nodes with temporal dependency are grouped into a knot. These knots then serve as the basic units in the construction of a k-knot graph for exploring the video structure. We build the minimum-distance spanning tree (MST) from the k-knot graph such that track nodes of similar appearance are adjacent to each other in MST. Non-local cost aggregation is performed on MST, which ensures information from face and person tracks is utilized as a whole to improve the identification performance. The identification task is performed by minimizing the cost of each knot, which takes into account the unique presence of a subject in a venue. Experimental results demonstrate the effectiveness of our method.

Rama Chellappa | Ching-Hui Chen | R. Chellappa | Ching-Hui Chen

[1] David K. Smith. Network Flows: Theory, Algorithms, and Applications , 1994 .

[2] Rainer Stiefelhagen,et al. Semi-supervised Learning with Constraints for Person Identification in Multimedia Data , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[3] Rainer Stiefelhagen,et al. “Knock! Knock! Who is it?” probabilistic person identification in TV-series , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4] B. Taskar,et al. Learning from ambiguously labeled images , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5] Rainer Stiefelhagen,et al. Story-based Video Retrieval in TV series using Plot Synopses , 2014, ICMR.

[6] Andrew Zisserman,et al. Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video , 2006, BMVC.

[7] Makarand Tapaswi,et al. StoryGraphs: Visualizing Character Interactions as a Timeline , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Andrew Zisserman,et al. Person Spotting: Video Shot Retrieval for Face Sets , 2005, CIVR.

[9] Changsheng Xu,et al. Character-based movie summarization , 2010, ACM Multimedia.

[10] Fei-Fei Li,et al. Linking People in Videos with "Their" Names Using Coreference Resolution , 2014, ECCV.

[11] Qiang Ji,et al. Simultaneous Clustering and Tracklet Linking for Multi-face Tracking in Videos , 2013, 2013 IEEE International Conference on Computer Vision.

[12] Cordelia Schmid,et al. Unsupervised metric learning for face identification in TV video , 2011, 2011 International Conference on Computer Vision.

[13] Qingxiong Yang,et al. A non-local cost aggregation method for stereo matching , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[14] Li Bai,et al. Cosine Similarity Metric Learning for Face Verification , 2010, ACCV.

[15] Julee Cobb,et al. Hello, My Name Is… , 2016 .

[16] Hsuan-Tien Lin,et al. A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[17] Chih-Jen Lin,et al. LIBSVM: A library for support vector machines , 2011, TIST.

[18] Rainer Stiefelhagen,et al. Improved weak labels using contextual cues for person identification in videos , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[19] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20] Rama Chellappa,et al. Face Association across Unconstrained Video Frames Using Conditional Random Fields , 2012, ECCV.

[21] Qiang Ji,et al. Constrained Clustering and Its Application to Face Clustering in Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.