Robust speaker diarization for meetings: ICSI RT06s evaluation system

In this paper we present the ICSI speaker diarization system submitted for the NIST Rich Transcription evaluation (RT06s) [1] conducted on the meetings environment. This is a set of yearly evaluations which in the last two years have included speaker diarization of two kinds of distinct meetings: conference room and lecture room. The system presented focuses on being robust to changes in the meeting conditions by not using any training data. In this paper we introduce four of the main improvements to the system from last years’ submission: The first is a new training-free speech/non-speech detection algorithm. The second is the introduction of a new algorithm for system initialization. The third is the use of a frame purification algorithm to increase clusters differentiability. The last improvement is the use of inter-channel delays as features, greatly improving performance. We show the diarization error rate (DER) score of this system on all available meeting datasets to date for the multiple distant microphone (MDM) and single distant microphone (SDM) conditions. Index Terms: Speaker diarization, speaker segmentation and clustering, meetings indexing.

[1]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[2]  Jitendra Ajmera,et al.  A robust speaker clustering algorithm , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[3]  Xavier Anguera Miró,et al.  Robust Speaker Segmentation for Meetings: The ICSI-SRI Spring 2005 Diarization System , 2005, MLMI.

[4]  Douglas A. Reynolds,et al.  Approaches and applications of audio diarization , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[5]  Andreas Stolcke,et al.  Further Progress in Meeting Recognition: The ICSI-SRI Spring 2005 Speech-to-Text Evaluation System , 2005, MLMI.

[6]  X. Anguera,et al.  Speaker diarization for multi-party meetings using acoustic fusion , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[7]  Chuck Wooters,et al.  FRAME PURIFICATION FOR CLUSTER COMPARISON IN SPEAKER DIARIZATION , 2006 .

[8]  Climent Nadeu,et al.  Hybrid Speech/non-speech detector applied to Speaker Diarization of Meetings , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[9]  Xavier Anguera Miró,et al.  Speaker diarization for multiple distant microphone meetings: mixing acoustic features and inter-channel time differences , 2006, INTERSPEECH.

[10]  Xavier Anguera Miró,et al.  Friends and enemies: a novel initialization for speaker diarization , 2006, INTERSPEECH.

[11]  Xavier Anguera Miró,et al.  Automatic Cluster Complexity and Quantity Selection: Towards Robust Speaker Diarization , 2006, MLMI.

[12]  Nikki Mirghafori,et al.  Nuts and Flakes: a Study of Data Characteristics in Speaker Diarization , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.