Context-Oriented Name-Face Association in Web Videos

Automatically linking faces in Web videos with their names scattered in the surrounding text (e.g., the user generated title and tags) is an important task for many applications. Traditionally, this task is accomplished either by jointly exploring visual-textual consistency under constraints, or by leveraging external resources, e.g., public facial images. This paper follows the second paradigm and implements the name-face association by matching faces appearing in Web videos with carefully collected Web facial images. Specially, given a Web video, we first identify the relevant and discriminative tags from its surrounding text. The tags are defined as Contextual Tags (CTags) as they roughly give the semantic context of the video (e.g., who are doing what at when and where). Then, facial images are retrieved by issuing a commercial search engine using the assembled text queries, where each query contains a detected name and one of the top CTags. By doing this, we crawl facial images that are highly relevant to the person in the video context, and thus the task of name-face association can be simply implemented by matching faces. Compared with traditional methods, our novelty lies in the exploration of both visual content of the video and crowdsourced text of the context that aims to find more specific facial images from the Web to facilitate the association. Experimental results on real-world Web videos containing faces and celebrity names show that the proposed method outperforms several existing methods in performance.

[1]  Rong Zheng,et al.  Video to Article Hyperlinking by Multiple Tag Property Exploration , 2014, MMM.

[2]  Chong-Wah Ngo,et al.  Unsupervised Celebrity Face Naming in Web Videos , 2015, IEEE Transactions on Multimedia.

[3]  Chong-Wah Ngo,et al.  Community as a connector: associating faces with celebrity names in web videos , 2012, ACM Multimedia.

[4]  Yongdong Zhang,et al.  Web video retagging , 2011, Multimedia Tools and Applications.

[5]  Dong Liu,et al.  Image Retagging Using Collaborative Tag Propagation , 2011, IEEE Transactions on Multimedia.

[6]  Changsheng Xu,et al.  Character Identification in Feature-Length Films Using Global Face-Name Matching , 2009, IEEE Transactions on Multimedia.

[7]  Chong-Wah Ngo,et al.  On the Annotation of Web Videos by Efficient Near-Duplicate Search , 2010, IEEE Transactions on Multimedia.

[8]  Ming Yang,et al.  DeepFace: Closing the Gap to Human-Level Performance in Face Verification , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Cordelia Schmid,et al.  Face recognition from caption-based supervision , 2010 .

[10]  Chun Chen,et al.  Unsupervised face-name association via commute distance , 2012, ACM Multimedia.

[11]  Chong-Wah Ngo,et al.  Improving Automatic Name-Face Association using Celebrity Images on the Web , 2015, ICMR.

[12]  Ming Zhao,et al.  Large scale learning and recognition of faces in web videos , 2008, 2008 8th IEEE International Conference on Automatic Face & Gesture Recognition.

[13]  Yongdong Zhang,et al.  Context-oriented web video tag recommendation , 2010, WWW '10.

[14]  Chong-Wah Ngo,et al.  Name-Face Association in Web Videos: A Large-Scale Dataset, Baselines, and Open Issues , 2014, Journal of Computer Science and Technology.

[15]  Marcel Worring,et al.  Learning Social Tag Relevance by Neighbor Voting , 2009, IEEE Transactions on Multimedia.

[16]  Mark Sanderson,et al.  Content redundancy in YouTube and its application to video tagging , 2011, TOIS.

[17]  Yongdong Zhang,et al.  Web video categorization based on Wikipedia categories and content-duplicated open resources , 2010, ACM Multimedia.