论文信息 - Automatic face naming with caption-based supervision

Automatic face naming with caption-based supervision

We consider two scenarios of naming people in databases of news photos with captions: (i) finding faces of a single person, and (ii) assigning names to all faces. We combine an initial text-based step, that restricts the name assigned to a face to the set of names appearing in the caption, with a second step that analyzes visual features of faces. By searching for groups of highly similar faces that can be associated with a name, the results of purely text-based search can be greatly ameliorated. We improve a recent graph-based approach, in which nodes correspond to faces and edges connect highly similar faces. We introduce constraints when optimizing the objective function, and propose improvements in the low-level methods used to construct the graphs. Furthermore, we generalize the graph-based approach to face naming in the full data set. In this multi-person naming case the optimization quickly becomes computationally demanding, and we present an important speed-up using graph-flows to compute the optimal name assignments in documents. Generative models have previously been proposed to solve the multi-person naming task. We compare the generative and graph-based methods in both scenarios, and find significantly better performance using the graph-based methods in both cases.

[1] Cordelia Schmid,et al. Local Grayvalue Invariants for Image Retrieval , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[2] Takeo Kanade,et al. Name-It: association of face and name in video , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3] Moses Charikar,et al. Greedy approximation algorithms for finding dense components in a graph , 2000, APPROX.

[4] David A. Forsyth,et al. Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[5] Michael I. Jordan,et al. Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6] G LoweDavid,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[7] Yee Whye Teh,et al. Names and faces in the news , 2004, CVPR 2004.

[8] Cordelia Schmid,et al. Human Detection Based on a Probabilistic Assembly of Robust Part Detectors , 2004, ECCV.

[9] Pietro Perona,et al. Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[10] Marie-Francine Moens,et al. Efficient Hierarchical Entity Classifier Using Conditional Random Fields , 2006, OntologyLearning@COLING/ACL.

[11] Pinar Duygulu Sahin,et al. A Graph Based Approach for Naming Faces in News Photos , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[12] Andrew Zisserman,et al. Hello! My name is... Buffy'' -- Automatic Naming of Characters in TV Video , 2006, BMVC.

[13] David A. Forsyth,et al. Animals on the Web , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[14] Andrew McCallum,et al. People-LDA: Anchoring Topics to People using Face Recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[15] Ron Bekkerman,et al. Multi-modal Clustering for Multimedia Collections , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Bill Triggs,et al. Region Classification with Markov Field Aspect Models , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[17] Xiaoyang Tan,et al. Enhanced Local Texture Feature Sets for Face Recognition Under Difficult Lighting Conditions , 2007, IEEE Transactions on Image Processing.