Evaluating human corrections in a computer-assisted speaker diarization system

In this paper, we present a framework to evaluate the human corrections of a speaker diarization system. We propose four elementary actions to correct the diarization (“ Create a boundary ”, “ Delete a boundary ”, “ Create a speaker label ” and “ Change the speaker label ”) and we propose an automaton to simulate the correction sequence. A metric is described to evaluate the correction cost. The framework is evaluated using French broadcast news drawn from the following campaigns: REPERE, ESTER and ETAPE.

[1]  Georges Quénot,et al.  Automatic propagation of manual annotations for multimodal person identification in TV shows , 2014, 2014 12th International Workshop on Content-Based Multimedia Indexing (CBMI).

[2]  Panayiotis Zaphiris,et al.  Cross-disciplinary Advances in Human Computer Interaction: User Modeling, Social Computing, and Adap , 2008 .

[3]  Paul Deléglise,et al.  Recent Improvements on ILP-based Clustering for Broadcast News Speaker Diarization , 2014, Odyssey.

[4]  Jean Carrive,et al.  Speech Trax: A Bottom to the Top Approach for Speaker Tracking and Indexing in an Archiving Context , 2016, LREC.

[5]  Mark Liberman,et al.  Transcriber: Development and use of a tool for assisting speech corpora production , 2001, Speech Commun..

[6]  Carolyn Penstein Rosé,et al.  Estimating Annotation Cost for Active Learning in a Multi-Annotator Environment , 2009, HLT-NAACL 2009.

[7]  Olivier Galibert,et al.  The First Official REPERE Evaluation , 2013, SLAM@INTERSPEECH.

[8]  Francesco Ricci,et al.  User Modeling, Adaptation, and Personalization , 2013, Lecture Notes in Computer Science.

[9]  Christian Wellekens,et al.  A speaker tracking system based on speaker turn detection for NIST evaluation , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10]  Peter Wittenburg,et al.  ELAN: a Professional Framework for Multimodality Research , 2006, LREC.

[11]  Hermann Ney,et al.  Statistical Approaches to Computer-Assisted Translation , 2009, CL.

[12]  Alan J. Dix,et al.  Human-Computer Interaction , 1993, Encyclopedia of Database Systems.

[13]  Hervé Bourlard,et al.  On the Use of Information Retrieval Measures for Speech Recognition Evaluation , 2004 .

[14]  Paul Deléglise,et al.  Computer-assisted transcription of speech based on confusion network reordering , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Olivier Galibert,et al.  Methodologies for the evaluation of speaker diarization and automatic speech recognition in the presence of overlapping speech , 2013, INTERSPEECH.

[16]  Francisco Casacuberta,et al.  Computer Assisted Transcription of Speech Signals , 2011 .

[17]  Marco Baroni,et al.  THE LANGUAGE COMPONENT OF THE FASTY TEXT PREDICTION SYSTEM , 2005, Appl. Artif. Intell..

[18]  Stéphane Ayache,et al.  Speaker Identity Indexing In Audio-Visual Documents , 2005 .

[19]  Guillaume Gravier,et al.  The ESTER phase II evaluation campaign for the rich transcription of French broadcast news , 2005, INTERSPEECH.

[20]  Gerhard Fischer,et al.  User Modeling in Human–Computer Interaction , 2001, User Modeling and User-Adapted Interaction.

[21]  Thierry Bazillon,et al.  Manual vs Assisted Transcription of Prepared and Spontaneous Speech , 2008, LREC.

[22]  Alfred Kobsa,et al.  User Modeling, Adaptation, and Personalization, 18th International Conference, UMAP 2010, Big Island, HI, USA, June 20-24, 2010. Proceedings , 2010, UMAP.

[23]  Nicholas W. D. Evans,et al.  Speaker Diarization: A Review of Recent Research , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[24]  Guillaume Gravier,et al.  The ester 2 evaluation campaign for the rich transcription of French radio broadcasts , 2009, INTERSPEECH.

[25]  Sylvain Meignier,et al.  An Active Learning Method for Speaker Identity Annotation in Audio Recordings , 2016, MMDA@ECAI.

[26]  Martha Larson,et al.  Enhanced Multimedia Content Access and Exploitation Using Semantic Speech Retrieval , 2009, 2009 IEEE International Conference on Semantic Computing.

[27]  Georges Linarès,et al.  Spontaneous Speech Characterization and Detection in Large Audio Database , 2009 .