论文信息 - Evaluating human corrections in a computer-assisted speaker diarization system

Evaluating human corrections in a computer-assisted speaker diarization system

In this paper, we present a framework to evaluate the human corrections of a speaker diarization system. We propose four elementary actions to correct the diarization (“ Create a boundary ”, “ Delete a boundary ”, “ Create a speaker label ” and “ Change the speaker label ”) and we propose an automaton to simulate the correction sequence. A metric is described to evaluate the correction cost. The framework is evaluated using French broadcast news drawn from the following campaigns: REPERE, ESTER and ETAPE.

Sylvain Meignier | Jean Carrive | David Doukhan | Pierre-Alexandre Broux | Simon Petitrenaud

[1] Georges Quénot,et al. Automatic propagation of manual annotations for multimodal person identification in TV shows , 2014, 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI).

[2] Panayiotis Zaphiris,et al. Cross-disciplinary Advances in Human Computer Interaction: User Modeling, Social Computing, and Adap , 2008 .

[3] Paul Deléglise,et al. Recent Improvements on ILP-based Clustering for Broadcast News Speaker Diarization , 2014, Odyssey.

[4] Jean Carrive,et al. Speech Trax: A Bottom to the Top Approach for Speaker Tracking and Indexing in an Archiving Context , 2016, LREC.

[5] Mark Liberman,et al. Transcriber: Development and use of a tool for assisting speech corpora production , 2001, Speech Commun..

[6] Carolyn Penstein Rosé,et al. Estimating Annotation Cost for Active Learning in a Multi-Annotator Environment , 2009, HLT-NAACL 2009.

[7] Olivier Galibert,et al. The First Official REPERE Evaluation , 2013, SLAM@INTERSPEECH.

[8] Francesco Ricci,et al. User Modeling, Adaptation, and Personalization , 2013, Lecture Notes in Computer Science.

[9] Christian Wellekens,et al. A speaker tracking system based on speaker turn detection for NIST evaluation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10] Peter Wittenburg,et al. ELAN: a Professional Framework for Multimodality Research , 2006, LREC.

[11] Hermann Ney,et al. Statistical Approaches to Computer-Assisted Translation , 2009, CL.

[12] Alan J. Dix,et al. Human-Computer Interaction , 1993, Encyclopedia of Database Systems.

[13] Hervé Bourlard,et al. On the Use of Information Retrieval Measures for Speech Recognition Evaluation , 2004 .

[14] Paul Deléglise,et al. Computer-assisted transcription of speech based on confusion network reordering , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15] Olivier Galibert,et al. Methodologies for the evaluation of speaker diarization and automatic speech recognition in the presence of overlapping speech , 2013, INTERSPEECH.

[16] Francisco Casacuberta,et al. Computer Assisted Transcription of Speech Signals , 2011 .

[17] Marco Baroni,et al. THE LANGUAGE COMPONENT OF THE FASTY TEXT PREDICTION SYSTEM , 2005, Appl. Artif. Intell..

[18] Stéphane Ayache,et al. Speaker Identity Indexing In Audio-Visual Documents , 2005 .

[19] Guillaume Gravier,et al. The ESTER phase II evaluation campaign for the rich transcription of French broadcast news , 2005, INTERSPEECH.

[20] Gerhard Fischer,et al. User Modeling in Human–Computer Interaction , 2001, User Modeling and User-Adapted Interaction.

[21] Thierry Bazillon,et al. Manual vs Assisted Transcription of Prepared and Spontaneous Speech , 2008, LREC.

[22] Alfred Kobsa,et al. User Modeling, Adaptation, and Personalization, 18th International Conference, UMAP 2010, Big Island, HI, USA, June 20-24, 2010. Proceedings , 2010, UMAP.

[23] Nicholas W. D. Evans,et al. Speaker Diarization: A Review of Recent Research , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[24] Guillaume Gravier,et al. The ester 2 evaluation campaign for the rich transcription of French radio broadcasts , 2009, INTERSPEECH.

[25] Sylvain Meignier,et al. An Active Learning Method for Speaker Identity Annotation in Audio Recordings , 2016, MMDA@ECAI.

[26] Martha Larson,et al. Enhanced Multimedia Content Access and Exploitation Using Semantic Speech Retrieval , 2009, 2009 IEEE International Conference on Semantic Computing.

[27] Georges Linarès,et al. Spontaneous Speech Characterization and Detection in Large Audio Database , 2009 .