Nommage non supervisé des personnes dans les émissions de télévision. Utilisation des noms écrits, des noms prononcés ou des deux ?

L'identification de personnes dans les emissions de television est un outil precieux pour l'indexation de ce type de videos mais l'utilisation de modeles biometriques n'est pas une option viable sans connaissance a priori des personnes presentes dans les videos. Les noms prononces ou ecrits peuvent nous fournir une liste de noms hypotheses. Nous proposons une comparaison du potentiel de ces deux modalites (noms prononces ou ecrits) afin d'extraire le nom des personnes parlant et/ou apparaissant. Les noms prononces proposent un plus grand nombre d'occurrences de citation mais les erreurs de transcription et de detection de ces noms reduisent de moitie le potentiel de cette modalite. Les noms ecrits beneficient d'une amelioration croissante de la qualite des videos et sont plus facilement detectes. Par ailleurs, l'affiliation aux locuteurs/visages des noms ecrits reste plus simple que pour les noms prononces.

[1]  Georges Quénot,et al.  Unsupervised naming of speakers in broadcast TV: using written names, pronounced names or both? , 2013, INTERSPEECH.

[2]  Frédéric Béchet,et al.  Detecting person presence in TV shows with linguistic and structural features , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Sylvain Meignier,et al.  Identification of Speakers by Name Using Belief Functions , 2010, IPMU.

[4]  Yannick Estève,et al.  Reconnaissance Automatique de Locuteurs à l'aide de Fonctions de Croyance , 2010 .

[5]  Sophie Rosset,et al.  Models Cascade for Tree-Structured Named Entity Detection , 2011, IJCNLP.

[6]  Yoram Singer,et al.  Improved Boosting Algorithms Using Confidence-rated Predictions , 1998, COLT' 98.

[7]  Georges Quénot,et al.  Fusion of Speech, Faces and Text for Person Identification in TV Broadcast , 2012, ECCV Workshops.

[8]  Sue Tranter Who Really Spoke When? Finding Speaker Turns and Identities in Broadcast News Audio , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[9]  Takeo Kanade,et al.  Name-It: Naming and Detecting Faces in News Videos , 1999, IEEE Multim..

[10]  Georges Quénot,et al.  Unsupervised Speaker Identification using Overlaid Texts in TV Broadcast , 2012, INTERSPEECH.

[11]  Olivier Galibert,et al.  The REPERE Corpus : a multimodal corpus for person recognition , 2012, LREC.

[12]  Alexandre Allauzen,et al.  Training and Evaluation of POS Taggers on the French MULTITAG Corpus , 2008, LREC.

[13]  Olivier Galibert,et al.  The LIMSI Participation in the QAst 2009 Track: Experimenting on Answer Scoring , 2009, CLEF.

[14]  L. Lamel,et al.  A comparative study using manual and automatic transcriptions for diarization , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[15]  Jean-Luc Gauvain,et al.  Partitioning and transcription of broadcast news data , 1998, ICSLP.

[16]  Takeo Kanade,et al.  Name-It: Naming and Detecting Faces in Video by the Integration of Image and Natural Language Processing , 1997, IJCAI.

[17]  Georges Quénot,et al.  From Text Detection in Videos to Person Identification , 2012, 2012 IEEE International Conference on Multimedia and Expo.

[18]  Qingming Huang,et al.  Naming faces in broadcast news video by image google , 2008, ACM Multimedia.

[19]  Rong Yan,et al.  Multiple instance learning for labeling faces in broadcasting news video , 2005, MULTIMEDIA '05.

[20]  Sylvain Meignier,et al.  Automatic named identification of speakers using diarization and ASR systems , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[21]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[22]  Georges Quénot,et al.  Towards a Better Integration of Written Names for Unsupervised Speakers Identification in Videos , 2013, SLAM@INTERSPEECH.

[23]  Georges Quénot,et al.  Nommage non-supervisé des personnes dans les émissions de télévision : une revue du potentiel de chaque modalité , 2014, CORIA.

[24]  Delphine Charlet,et al.  Unsupervised face identification in TV content using audio-visual sources , 2013, 2013 11th International Workshop on Content-Based Multimedia Indexing (CBMI).

[25]  Jun Yang,et al.  Naming every individual in news video monologues , 2004, MULTIMEDIA '04.

[26]  Hervé Bredin,et al.  Integer linear programming for speaker diarization and cross-modal identification in TV broadcast , 2013, INTERSPEECH.

[27]  Ching-Yung Lin,et al.  Cross-Modality Automatic Face Model Training from Large Video Databases , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[28]  Elie el Khoury,et al.  Combining transcription-based and acoustic-based speaker identifications for broadcast news , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[29]  Stéphane Ayache,et al.  Speaker Identity Indexing In Audio-Visual Documents , 2005 .

[30]  Ngoc Thang Vu,et al.  Speech recognition for machine translation in Quaero , 2011, IWSLT.

[31]  Julie Mauclair,et al.  Speaker Diarization: About whom the Speaker is Talking ? , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[32]  Paul Deléglise,et al.  Extracting true speaker identities from transcriptions , 2007, INTERSPEECH.

[33]  Ricky Houghton Named Faces: Putting Names to Faces , 1999, IEEE Intell. Syst..

[34]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[35]  Jean-Luc Gauvain,et al.  Speaker diarization from speech transcripts , 2004, INTERSPEECH.

[36]  François Yvon,et al.  Practical Very Large Scale CRFs , 2010, ACL.

[37]  Takeo Kanade,et al.  Name-It: association of face and name in video , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38]  Georges Quénot,et al.  QCompere @ REPERE 2013 , 2013, SLAM@INTERSPEECH.