Computer-assisted Speaker Diarization: How to Evaluate Human Corrections

In this paper, we present a framework to evaluate the human corrections of a speaker diarization. We propose four elementary actions to correct the diarization and an automaton to simulate the correction sequence. A metric is described to evaluate the correction cost. The framework is evaluated using French broadcast news drawn from the REPERE corpus.

[1]  Yannick Estève,et al.  Transcription manuelle vs assistée de la parole préparée et spontanée , 2008 .

[2]  Stéphane Ayache,et al.  Speaker Identity Indexing In Audio-Visual Documents , 2005 .

[3]  Sylvain Meignier,et al.  An Active Learning Method for Speaker Identity Annotation in Audio Recordings , 2016, MMDA@ECAI.

[4]  Hervé Bourlard,et al.  On the Use of Information Retrieval Measures for Speech Recognition Evaluation , 2004 .

[5]  Olivier Galibert,et al.  A presentation of the REPERE challenge , 2012, 2012 10th International Workshop on Content-Based Multimedia Indexing (CBMI).

[6]  Georges Quénot,et al.  Automatic propagation of manual annotations for multimodal person identification in TV shows , 2014, 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI).

[7]  Matthew G. Snover,et al.  A Study of Translation Edit Rate with Targeted Human Annotation , 2006, AMTA.

[8]  Jean Carrive,et al.  Speech Trax: A Bottom to the Top Approach for Speaker Tracking and Indexing in an Archiving Context , 2016, LREC.

[9]  Georges Linarès,et al.  Spontaneous Speech Characterization and Detection in Large Audio Database , 2009 .

[10]  Ralph Weischedel,et al.  A STUDY OF TRANSLATION ERROR RATE WITH TARGETED HUMAN ANNOTATION , 2005 .

[11]  Sylvain Meignier,et al.  LIUM SPKDIARIZATION: AN OPEN SOURCE TOOLKIT FOR DIARIZATION , 2010 .

[12]  Christian Wellekens,et al.  A speaker tracking system based on speaker turn detection for NIST evaluation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[13]  Olivier Galibert,et al.  Methodologies for the evaluation of speaker diarization and automatic speech recognition in the presence of overlapping speech , 2013, INTERSPEECH.

[14]  Carolyn Penstein Rosé,et al.  Estimating Annotation Cost for Active Learning in a Multi-Annotator Environment , 2009, HLT-NAACL 2009.

[15]  Eric Lewis,et al.  WINDMILL-THE USE OF A PARSING ALGORITHM TO PRODUCE PREDICTIONS FOR DISABLED PERSONS , 1996 .

[16]  Peter Wittenburg,et al.  ELAN: a Professional Framework for Multimodality Research , 2006, LREC.

[17]  Mark Liberman,et al.  Transcriber: Development and use of a tool for assisting speech corpora production , 2001, Speech Commun..

[18]  Nicholas W. D. Evans,et al.  Speaker Diarization: A Review of Recent Research , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[19]  Martha Larson,et al.  Enhanced Multimedia Content Access and Exploitation Using Semantic Speech Retrieval , 2009, 2009 IEEE International Conference on Semantic Computing.