Segmentation et Regroupement en Locuteurs: comment évaluer les corrections humaines

Dans cet article, nous presentons un simulateur dedie a l'evaluation des corrections humaines sur la tâche de Segmentation et Regroupement en Locuteurs (SRL). Nous proposons quatre actions elementaires afin de corriger une SRL et un automate pour simuler la sequence de corrections. Une mesure est proposee pour evaluer le cout de correction. Le simulateur est evalue en utilisant des emissions francaises d'information tirees du corpus REPERE. ABSTRACT Computer-assisted speaker diarization : how to evaluate human corrections In this paper, we present a framework to evaluate the human correction of a speaker diarization. We propose four elementary actions to correct the diarization and an automaton to simulate the correction sequence. A metric is described to evaluate the correction cost. The framework is evaluated using French broadcast news drawn from the REPERE corpus.

[1]  Georges Quénot,et al.  Automatic propagation of manual annotations for multimodal person identification in TV shows , 2014, 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI).

[2]  Yannick Estève,et al.  Transcription manuelle vs assistée de la parole préparée et spontanée , 2008 .

[3]  Peter Wittenburg,et al.  ELAN: a Professional Framework for Multimodality Research , 2006, LREC.

[4]  Martha Larson,et al.  Enhanced Multimedia Content Access and Exploitation Using Semantic Speech Retrieval , 2009, 2009 IEEE International Conference on Semantic Computing.

[5]  Stéphane Ayache,et al.  Speaker Identity Indexing In Audio-Visual Documents , 2005 .

[6]  Mark Liberman,et al.  Transcriber: Development and use of a tool for assisting speech corpora production , 2001, Speech Commun..

[7]  Sylvain Meignier,et al.  LIUM SPKDIARIZATION: AN OPEN SOURCE TOOLKIT FOR DIARIZATION , 2010 .

[8]  Sylvain Meignier,et al.  An Active Learning Method for Speaker Identity Annotation in Audio Recordings , 2016, MMDA@ECAI.

[9]  Nicholas W. D. Evans,et al.  Speaker Diarization: A Review of Recent Research , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Christian Wellekens,et al.  A speaker tracking system based on speaker turn detection for NIST evaluation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[11]  Jean Carrive,et al.  Speech Trax: A Bottom to the Top Approach for Speaker Tracking and Indexing in an Archiving Context , 2016, LREC.

[12]  Cordelia Schmid,et al.  Is that you? Metric learning approaches for face identification , 2009, 2009 IEEE 12th International Conference on Computer Vision.