People-LDA: Anchoring Topics to People using Face Recognition

Topic models have recently emerged as powerful tools for modeling topical trends in documents. Often the resulting topics are broad and generic, associating large groups of people and issues that are loosely related. In many cases, it may be desirable to influence the direction in which topic models develop. In this paper, we explore the idea of centering topics around people. In particular, given a large corpus of images featuring collections of people and associated captions, it seems natural to extract topics specifically focussed on each person. What words are most associated with George Bush? Which with Condoleezza Rice? Since people play such an important role in life, it is natural to anchor one topic to each person. In this paper, we present People-LDA, which uses the coherence efface images in news captions to guide the development of topics. In particular, we show how topics can be refined to be more closely related to a single person (like George Bush) rather than describing groups of people in a related area (like politics). To do this we introduce a new graphical model that tightly couples images and captions through a modern face recognizer. In addition to producing topics that are people specific (using images as a guiding force), the model also performs excellent soft clustering efface images, using the language model to boost performance. We present a variety of experiments comparing our method to recent developments in topic modeling and joint image-language modeling, showing that our model has lower perplexity for face identification than competing models and produces more refined topics.

[1]  Alex Pentland,et al.  Bayesian face recognition , 2000, Pattern Recognit..

[2]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[3]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[4]  Wei Li,et al.  Early results for Named Entity Recognition with Conditional Random Fields, Feature Induction and Web-Enhanced Lexicons , 2003, CoNLL.

[5]  Erik G. Learned-Miller,et al.  Building a classification cascade for visual identification from one example , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[6]  Rama Chellappa,et al.  Discriminant analysis of principal components for face recognition , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[7]  Charles Elkan,et al.  Clustering documents with an exponential-family approximation of the Dirichlet compound multinomial distribution , 2006, ICML.

[8]  Matthew J. Beal Variational algorithms for approximate Bayesian inference , 2003 .

[9]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  Erik G. Learned-Miller,et al.  Discriminative Training of Hyper-feature Models for Object Identification , 2006, BMVC.

[12]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[13]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[14]  Yee Whye Teh,et al.  Names and faces in the news , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..