论文信息 - Cross-Modality Automatic Face Model Training from Large Video Databases

Cross-Modality Automatic Face Model Training from Large Video Databases

Face recognition is an important issue on video indexing and retrieval applications. Usually, supervised learning is used to build face models for various specific named individuals. However, a huge amount of labeling work is needed in a traditional supervised learning framework. In this paper, we propose an automatic cross-modality training scheme without supervision which uses automatic speech recognition of videos to build visual face models. Based on Multiple-Instance Learning algorithms, we introduce novel concepts of "Quasi-Positive bags" and "Extended Diverse Density", and use them to develop an automatic training scheme. We also propose to use the "Relative Sparsity" of a cluster to detect the anchorperson in the news videos. Experiments show that our algorithm can get correct models for the persons we are interested in. The automatic learned models are tested and compared with a supervised learning algorithm for face recognition in large news video databases, and show promising results.

Ching-Yung Lin | Ming-Ting Sun | Xiaodan Song

[1] Oded Maron,et al. Learning from Ambiguity , 1998 .

[2] M. Turk,et al. Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[3] R. Chellappa,et al. Subspace Linear Discriminant Analysis for Face Recognition , 1999 .

[4] Paul A. Viola,et al. Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[5] Azriel Rosenfeld,et al. Face recognition: A literature survey , 2003, CSUR.

[6] Paul A. Viola,et al. Robust Real-time Object Detection , 2001 .

[7] James M. Rehg,et al. Statistical Color Models with Application to Skin Detection , 2004, International Journal of Computer Vision.

[8] Takeo Kanade,et al. Name-It: association of face and name in video , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9] Jean-Luc Gauvain,et al. The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[10] Thomas G. Dietterich,et al. Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[11] Tomás Lozano-Pérez,et al. A Framework for Multiple-Instance Learning , 1997, NIPS.

[12] Norbert Krüger,et al. Face Recognition by Elastic Bunch Graph Matching , 1997, CAIP.

[13] G. Schwarz. Estimating the Dimension of a Model , 1978 .

[14] Adrian E. Raftery,et al. MCLUST: Software for Model-Based Clustering, Density Estimation and Discriminant Analysis , 2002 .

[15] Rainer Lienhart,et al. Empirical Analysis of Detection Cascades of Boosted Classifiers for Rapid Object Detection , 2003, DAGM-Symposium.