Speaker diarization for multiple distant microphone meetings: mixing acoustic features and inter-channel time differences

Speaker diarization for recordings made in meetings consists of identifying the number of participants in each meeting and creating a list of speech time intervals for each participant. In recently published work [7] we presented some experiments using only TDOA values (Time Delay Of Arrival for different channels) applied to this task. We demonstrated that information in those values can be used to segment the speakers. In this paper we have developed a method to mix the TDOA values with the acoustic values by calculating a combined loglikelihood between both sets of vectors. Using this method we have been able to reduce the DER by 16.34% (relative) for the NIST RT05s set (scored without overlap and manually transcribed references) the DER for our devel06s set (scored with overlap and force-aligned references) by 21% (relative) and the DER for the NIST RT06s (scored with overlap and manually transcribed references) by 15% (relative) . Index terms: Speaker diarization, speaker segmentation, meetings recognition.

[1]  Xavier Anguera Miró,et al.  Speaker Diarization for Multi-microphone Meetings Using Only Between-Channel Differences , 2006, MLMI.

[2]  Daniel P. W. Ellis,et al.  Speaker turn segmentation based on between-channel differences , 2004 .

[3]  Barbara Peskin,et al.  TOWARDS ROBUST SPEAKER SEGMENTATION: THE ICSI-SRI FALL 2004 DIARIZATION SYSTEM , 2004 .

[4]  Daniel P. W. Ellis,et al.  Using acoustic condition clustering to improve acoustic change detection on broadcast news , 2000, INTERSPEECH.

[5]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[6]  Andreas Stolcke,et al.  Further Progress in Meeting Recognition: The ICSI-SRI Spring 2005 Speech-to-Text Evaluation System , 2005, MLMI.

[7]  Xavier Anguera Miró,et al.  Robust Speaker Segmentation for Meetings: The ICSI-SRI Spring 2005 Diarization System , 2005, MLMI.

[8]  Climent Nadeu,et al.  Hybrid Speech/non-speech detector applied to Speaker Diarization of Meetings , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[9]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[10]  Andreas Stolcke,et al.  The ICSI Meeting Project: Resources and Research , 2004 .

[11]  X. Anguera,et al.  Speaker diarization for multi-party meetings using acoustic fusion , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[12]  Jitendra Ajmera,et al.  A robust speaker clustering algorithm , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).