Combining transcription-based and acoustic-based speaker identifications for broadcast news

In this paper, we consider the issue of speaker identification within audio records of broadcast news. The speaker identity information is extracted from both transcript-based and acoustic-based speaker identification systems. This information is combined in the belief functions framework, which makes coherent the knowledge representation of the problem. The Kuhn-Munkres algorithm is used to optimize the assignment problem of speaker identities and speaker clusters. Experiments carried out on French broadcast news from the French evaluation campaign ESTER show the efficiency of the proposed combination method.

[1]  Sylvain Meignier,et al.  LIUM SPKDIARIZATION: AN OPEN SOURCE TOOLKIT FOR DIARIZATION , 2010 .

[2]  Mauro Dell'Amico,et al.  Assignment Problems , 1998, IFIP Congress: Fundamentals - Foundations of Computer Science.

[3]  Guillaume Gravier,et al.  The ESTER phase II evaluation campaign for the rich transcription of French broadcast news , 2005, INTERSPEECH.

[4]  Sue Tranter Who Really Spoke When? Finding Speaker Turns and Identities in Broadcast News Audio , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[5]  Sylvain Meignier,et al.  Automatic named identification of speakers using diarization and ASR systems , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Frédéric Béchet,et al.  Unsupervised knowledge acquisition for Extracting Named Entities from speech , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[8]  Glenn Shafer,et al.  A Mathematical Theory of Evidence , 2020, A Mathematical Theory of Evidence.

[9]  Philippe Smets,et al.  The Transferable Belief Model , 1991, Artif. Intell..

[10]  L. Lamel,et al.  A comparative study using manual and automatic transcriptions for diarization , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[11]  Paul Deléglise,et al.  Improvements to the LIUM French ASR system based on CMU sphinx: what helps to significantly reduce the word error rate? , 2009, INTERSPEECH.

[12]  Renato De Mori,et al.  The Application of Semantic Classification Trees to Natural Language Understanding , 1995, IEEE Trans. Pattern Anal. Mach. Intell..