Unsupervised face-name association via commute distance

Recently, the task of unsupervised face-name association has received a considerable interests in multimedia and information retrieval communities. It is quite different with the generic facial image annotation problem because of its unsupervised and ambiguous assignment properties. Specifically, the task of face-name association should obey the following three constraints: (1) a face can only be assigned to a name appearing in its associated caption or to null; (2) a name can be assigned to at most one face; and (3) a face can be assigned to at most one name. Many conventional methods have been proposed to tackle this task while suffering from some common problems, eg, many of them are computational expensive and hard to make the null assignment decision. In this paper, we design a novel framework named face-name association via commute distance (FACD), which judges face-name and face-null assignments under a unified framework via commute distance (CD) algorithm. Then, to further speed up the on-line processing, we propose a novel anchor-based commute distance (ACD) algorithm whose main idea is using the anchor point representation structure to accelerate the eigen-decomposition of the adjacency matrix of a graph. Systematic experiment results on a large scale and real world image-caption database with a total of 194,046 detected faces and 244,725 names show that our proposed approach outperforms many state-of-the-art methods in performance. Our framework is appropriate for a large scale and real-time system.

[1]  Chun Chen,et al.  Efficient manifold ranking for image retrieval , 2011, SIGIR.

[2]  Michael R. Lyu,et al.  Face Annotation Using Transductive Kernel Fisher Discriminant , 2008, IEEE Transactions on Multimedia.

[3]  James Ze Wang,et al.  Image retrieval: Ideas, influences, and trends of the new age , 2008, CSUR.

[4]  Bernhard Schölkopf,et al.  Ranking on Data Manifolds , 2003, NIPS.

[5]  L. Asz Random Walks on Graphs: a Survey , 2022 .

[6]  Ulrike von Luxburg,et al.  A tutorial on spectral clustering , 2007, Stat. Comput..

[7]  Rong Jin,et al.  Random Projection with Filtering for Nearly Duplicate Search , 2012, AAAI.

[8]  Harry Shum,et al.  Scalable face image retrieval with identity-based quantization and multi-reference re-ranking , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Chun Chen,et al.  Semi-Supervised Nonlinear Hashing Using Bootstrap Sequential Projection Learning , 2013, IEEE Transactions on Knowledge and Data Engineering.

[10]  Takeo Kanade,et al.  Name-It: Naming and Detecting Faces in News Videos , 1999, IEEE Multim..

[11]  Chun Chen,et al.  Convex experimental design using manifold structure for image retrieval , 2009, MM '09.

[12]  Mingjing Li,et al.  Automated annotation of human faces in family albums , 2003, MULTIMEDIA '03.

[13]  Nikhil Srivastava,et al.  Graph Sparsification by Effective Resistances , 2011, SIAM J. Comput..

[14]  Jun Yang,et al.  Naming every individual in news video monologues , 2004, MULTIMEDIA '04.

[15]  Cordelia Schmid,et al.  Automatic face naming with caption-based supervision , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Ying He,et al.  Mining Weakly Labeled Web Facial Images for Search-Based Face Annotation , 2011, IEEE Transactions on Knowledge and Data Engineering.

[17]  Nguyen Lu Dang Khoa,et al.  Robust Outlier Detection Using Commute Time and Eigenspace Embedding , 2010, PAKDD.

[18]  Barbara Caputo,et al.  A Large-Scale Database of Images and Captions for Automatic Face Naming , 2011, BMVC.

[19]  Alexander C. Berg,et al.  Who's In the Picture , 2004, NIPS 2004.

[20]  Xinlei Chen,et al.  Large Scale Spectral Clustering with Landmark-Based Representation , 2011, AAAI.

[21]  Wei Liu,et al.  Large Graph Construction for Scalable Semi-Supervised Learning , 2010, ICML.

[22]  Wei-Ying Ma,et al.  Organizing WWW images based on the analysis of page layout and Web link structure , 2004, 2004 IEEE International Conference on Multimedia and Expo (ICME) (IEEE Cat. No.04TH8763).

[23]  Yee Whye Teh,et al.  Names and faces in the news , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[24]  Yihong Gong,et al.  Nonlinear Learning using Local Coordinate Coding , 2009, NIPS.

[25]  François Fouss,et al.  Random-Walk Computation of Similarities between Nodes of a Graph with Application to Collaborative Recommendation , 2007, IEEE Transactions on Knowledge and Data Engineering.

[26]  Jiawei Han,et al.  Learning a Maximum Margin Subspace for Image Retrieval , 2008, IEEE Transactions on Knowledge and Data Engineering.

[27]  Jiawei Han,et al.  Spectral regression: a unified subspace learning framework for content-based image retrieval , 2007, ACM Multimedia.

[28]  James T. Kwok,et al.  Prototype vector machine for large scale semi-supervised learning , 2009, ICML '09.

[29]  Ying He,et al.  Retrieval-Based Face Annotation by Weak Label Regularized Local Coordinate Coding , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Yuxiao Hu,et al.  Efficient propagation for face annotation in family albums , 2004, MULTIMEDIA '04.

[31]  László Lovász,et al.  Random Walks on Graphs: A Survey , 1993 .