论文信息 - Combining speaker identification and BIC for speaker diarization

Combining speaker identification and BIC for speaker diarization

This paper describes recent advances in speaker diarization by incorporating a speaker identification step. This system builds upon the LIMSI baseline data partitioner used in the broadcast news transcription system. This partitioner provides a high cluster purity but has a tendency to split the data from a speaker into several clusters, when there is a large quantity of data for the speaker. Several improvements to the baseline sys- tem have been made. Firstly, a standard Bayesian information criterion (BIC) agglomerative clustering has been integrated re- placing the iterative Gaussian mixture model (GMM) cluster- ing. Then a second clustering stage has been added, using a speaker identification method with MAP adapted GMM. A fi- nal post-processing stage refines the segment boundaries using the output of the transcription system. On the RT-04f and ES- TER evaluation data, the improved multi-stage system provides between 40% and 50% reduction of the speaker error, relative to a standard BIC clustering system.

Jean-Luc Gauvain | Xuan Zhu | Sylvain Meignier | Claude Barras

[1] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[2] Sridha Sridharan,et al. Feature warping for robust speaker verification , 2001, Odyssey.

[3] Jean-Luc Gauvain,et al. Partitioning and transcription of broadcast news data , 1998, ICSLP.

[4] M. A. Siegler,et al. Automatic Segmentation, Classification and Clustering of Broadcast News Audio , 1997 .

[5] Douglas A. Reynolds,et al. Blind clustering of speech utterances based on speaker and language characteristics , 1998, ICSLP.

[6] Jean-François Bonastre,et al. E-HMM approach for learning and adapting sound models for speaker indexing , 2001, Odyssey.

[7] Jean-Luc Gauvain,et al. Feature and score normalization for speaker verification of cellular data , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[8] S. Chen,et al. Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[9] Jean-Claude Junqua,et al. Towards domain independent speaker clustering , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[10] Jean-Luc Gauvain,et al. Speaker diarization from speech transcripts , 2004, INTERSPEECH.

[11] Steve Young,et al. Segment generation and clustering in the HTK broadcast news transcription system , 1998 .