PERCOLI: A Person Identification System for the 2013 REPERE Challenge

The goal of the PERCOL project is to participate to the REPERE multimodal evaluation program by building a consortium combining different scientific fields (audio, text and video) in order to perform person recognition in video documents. The two main scientific challenges we are addressing are firstly multimodal fusion algorithms for automatic person recognition in video broadcast ; and secondly the improvement of information extraction from speech and images thanks to a combine decoding using both modalities to reduce decoding ambiguities.

[1]  Enver Yücesan,et al.  Evaluating alternative system configurations using simulation: A nonparametric approach , 1994, Ann. Oper. Res..

[2]  Takeo Kanade,et al.  Name-It: association of face and name in video , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[4]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[5]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[6]  Sue Tranter Who Really Spoke When? Finding Speaker Turns and Identities in Broadcast News Audio , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[7]  Pinar Duygulu Sahin,et al.  A Graph Based Approach for Naming Faces in News Photos , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Qingming Huang,et al.  Naming faces in broadcast news video by image google , 2008, ACM Multimedia.

[9]  Christophe Garcia,et al.  text Detection with Convolutional Neural Networks , 2008, VISAPP.

[10]  Andrew Zisserman,et al.  Taking the bite out of automated naming of characters in TV video , 2009, Image Vis. Comput..

[11]  B. Taskar,et al.  Learning from ambiguously labeled images , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Feifan Liu,et al.  Identification of Soundbite and Its Speaker Name Using Transcripts of Broadcast News Speech , 2010, TALIP.

[13]  Frédéric Béchet,et al.  Unsupervised knowledge acquisition for Extracting Named Entities from speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[14]  Benoît Sagot,et al.  Aleda, a free large-scale entity database for French , 2012, LREC.

[15]  Georges Quénot,et al.  Unsupervised Speaker Identification using Overlaid Texts in TV Broadcast , 2012, INTERSPEECH.

[16]  Olivier Galibert,et al.  The REPERE Corpus : a multimodal corpus for person recognition , 2012, LREC.

[17]  J. Martinet,et al.  Les histogrammes spatio-temporels pour la ré-identification de personnes dans les journaux télévisés , 2012 .

[18]  Olivier Galibert,et al.  A presentation of the REPERE challenge , 2012, 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI).

[19]  Georges Quénot,et al.  Fusion of Speech, Faces and Text for Person Identification in TV Broadcast , 2012, ECCV Workshops.

[20]  Delphine Charlet,et al.  Impact of overlapping speech detection on speaker diarization for broadcast news and debates , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Delphine Charlet,et al.  Improving speaker identification in TV-shows using person name detection in overlaid text and speech , 2013, INTERSPEECH.

[22]  Delphine Charlet,et al.  Unsupervised face identification in TV content using audio-visual sources , 2013, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI).