Speaker Diarization: About whom the Speaker is Talking ?

The automatic speaker diarization consists in splitting the signal into homogeneous segments and clustering them by speakers. However the speaker segments are specified with anonymous labels. This paper suggests a solution to identify those speakers by extracting their full names pronounced in French broadcast news. A semantic classification tree is automatically built on a training corpus and associate the full names detected in the transcription of a segment to this segment or to one of its neighbors. Then, a merging method permits to associate a full name to a speaker cluster instead of an anonymous label provided by the diarization. The experiments are carried out over French broadcast news records from the ESTER 2005 evaluation campaign. About 70% show duration is correctly processed for both development and evaluation corpora. On the evaluation corpus, 18.2% show duration is wrongly named and no decision is taken for 11.9% show duration

[1]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[2]  Frédéric Béchet,et al.  Stochastic finite state automata language model triggered by dialogue states , 2001, INTERSPEECH.

[3]  Frédéric Bimbot,et al.  Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs , 2004, INTERSPEECH.

[4]  Frédéric Béchet,et al.  Tagging Unknown Proper Names Using Decision Trees , 2000, ACL.

[5]  Guillaume Gravier,et al.  The ESTER phase II evaluation campaign for the rich transcription of French broadcast news , 2005, INTERSPEECH.

[6]  Jean-Luc Gauvain,et al.  Improving Speaker Diarization , 2004 .

[7]  Guillaume Gravier,et al.  Corpus description of the ESTER Evaluation Campaign for the Rich Transcription of French Broadcast News , 2004, LREC.

[8]  Renato De Mori,et al.  The Application of Semantic Classification Trees to Natural Language Understanding , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Jitendra Ajmera,et al.  A robust speaker clustering algorithm , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[10]  L. Lamel,et al.  A comparative study using manual and automatic transcriptions for diarization , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[11]  Fall 2004 Rich Transcription ( RT-04 F ) Evaluation Plan , .

[12]  Douglas A. Reynolds,et al.  Speaker diarisation for broadcast news , 2004, Odyssey.

[13]  Jean-Luc Gauvain,et al.  Speaker diarization from speech transcripts , 2004, INTERSPEECH.

[14]  Jean-François Bonastre,et al.  Step-by-step and integrated approaches in broadcast news speaker diarization , 2006, Comput. Speech Lang..