Speaker Diarization: From Broadcast News to Lectures

This paper presents the LIMSI speaker diarization system for lecture data, in the framework of the Rich Transcription 2006 Spring (RT-06S) meeting recognition evaluation. This system builds upon the baseline diarization system designed for broadcast news data. The baseline system combines agglomerative clustering based on Bayesian information criterion with a second clustering using state-of-the-art speaker identification techniques. In the RT-04F evaluation, the baseline system provided an overall diarization error of 8.5% on broadcast news data. However since it has a high missed speech error rate on lecture data, a different speech activity detection approach based on the log-likelihood ratio between the speech and non-speech models trained on the seminar data was explored. The new speaker diarization system integrating this module provides an overall diarization error of 20.2% on the RT-06S Multiple Distant Microphone (MDM) data.

[1]  Douglas A. Reynolds,et al.  Blind clustering of speech utterances based on speaker and language characteristics , 1998, ICSLP.

[2]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[3]  Mauro Cettolo Segmentation, classification and clustering of an Italian broadcast news corpus , 2000 .

[4]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[5]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[6]  John Makhoul,et al.  THE 2004 BBN/LIMSI 10xRT ENGLISH BROADCAST NEWS TRANSCRIPTION SYSTEM , 2004 .

[7]  Dan Istrate,et al.  NIST RT'05S Evaluation: Pre-processing Techniques and Speaker Diarization on Multiple Microphone Meetings , 2005, MLMI.

[8]  Jean-Luc Gauvain,et al.  Multistage speaker diarization of broadcast news , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[10]  Xavier Anguera Miró,et al.  Robust Speaker Segmentation for Meetings: The ICSI-SRI Spring 2005 Diarization System , 2005, MLMI.

[11]  Jean-Luc Gauvain,et al.  Feature and score normalization for speaker verification of cellular data , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[12]  Nikki Mirghafori,et al.  Nuts and Flakes: a Study of Data Characteristics in Speaker Diarization , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[13]  Douglas A. Reynolds,et al.  Speaker diarisation for broadcast news , 2004, Odyssey.

[14]  Jean-Luc Gauvain,et al.  Combining speaker identification and BIC for speaker diarization , 2005, INTERSPEECH.