论文信息 - Computer-assisted Speaker Diarization: How to Evaluate Human Corrections

Computer-assisted Speaker Diarization: How to Evaluate Human Corrections

In this paper, we present a framework to evaluate the human corrections of a speaker diarization. We propose four elementary actions to correct the diarization and an automaton to simulate the correction sequence. A metric is described to evaluate the correction cost. The framework is evaluated using French broadcast news drawn from the REPERE corpus.

[1] Yannick Estève,et al. Transcription manuelle vs assistée de la parole préparée et spontanée , 2008 .

[2] Stéphane Ayache,et al. Speaker Identity Indexing In Audio-Visual Documents , 2005 .

[3] Sylvain Meignier,et al. An Active Learning Method for Speaker Identity Annotation in Audio Recordings , 2016, MMDA@ECAI.

[4] Hervé Bourlard,et al. On the Use of Information Retrieval Measures for Speech Recognition Evaluation , 2004 .

[5] Olivier Galibert,et al. A presentation of the REPERE challenge , 2012, 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI).

[6] Georges Quénot,et al. Automatic propagation of manual annotations for multimodal person identification in TV shows , 2014, 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI).

[7] Matthew G. Snover,et al. A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[8] Jean Carrive,et al. Speech Trax: A Bottom to the Top Approach for Speaker Tracking and Indexing in an Archiving Context , 2016, LREC.

[9] Georges Linarès,et al. Spontaneous Speech Characterization and Detection in Large Audio Database , 2009 .

[10] Ralph Weischedel,et al. A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[11] Sylvain Meignier,et al. LIUM SPKDIARIZATION: AN OPEN SOURCE TOOLKIT FOR DIARIZATION , 2010 .

[12] Christian Wellekens,et al. A speaker tracking system based on speaker turn detection for NIST evaluation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[13] Olivier Galibert,et al. Methodologies for the evaluation of speaker diarization and automatic speech recognition in the presence of overlapping speech , 2013, INTERSPEECH.

[14] Carolyn Penstein Rosé,et al. Estimating Annotation Cost for Active Learning in a Multi-Annotator Environment , 2009, HLT-NAACL 2009.

[15] Eric Lewis,et al. WINDMILL-THE USE OF A PARSING ALGORITHM TO PRODUCE PREDICTIONS FOR DISABLED PERSONS , 1996 .

[16] Peter Wittenburg,et al. ELAN: a Professional Framework for Multimodality Research , 2006, LREC.

[17] Mark Liberman,et al. Transcriber: Development and use of a tool for assisting speech corpora production , 2001, Speech Commun..

[18] Nicholas W. D. Evans,et al. Speaker Diarization: A Review of Recent Research , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[19] Martha Larson,et al. Enhanced Multimedia Content Access and Exploitation Using Semantic Speech Retrieval , 2009, 2009 IEEE International Conference on Semantic Computing.