论文信息 - Multistage speaker diarization of broadcast news

Multistage speaker diarization of broadcast news

This paper describes recent advances in speaker diarization with a multistage segmentation and clustering system, which incorporates a speaker identification step. This system builds upon the baseline audio partitioner used in the LIMSI broadcast news transcription system. The baseline partitioner provides a high cluster purity, but has a tendency to split data from speakers with a large quantity of data into several segment clusters. Several improvements to the baseline system have been made. First, the iterative Gaussian mixture model (GMM) clustering has been replaced by a Bayesian information criterion (BIC) agglomerative clustering. Second, an additional clustering stage has been added, using a GMM-based speaker identification method. Finally, a post-processing stage refines the segment boundaries using the output of a transcription system. On the National Institute of Standards and Technology (NIST) RT-04F and ESTER evaluation data, the multistage system reduces the speaker error by over 70% relative to the baseline system, and gives between 40% and 50% reduction relative to a single-stage BIC clustering system

[1] H Hermansky,et al. Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[2] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[3] Jean-François Bonastre,et al. E-HMM approach for learning and adapting sound models for speaker indexing , 2001, Odyssey.

[4] Alexander H. Waibel,et al. Strategies for automatic segmentation of audio data , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[5] Frédéric Bimbot,et al. Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs , 2004, INTERSPEECH.

[6] Douglas A. Reynolds,et al. Blind clustering of speech utterances based on speaker and language characteristics , 1998, ICSLP.

[7] Douglas A. Reynolds,et al. Approaches and applications of audio diarization , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[8] Guillaume Gravier,et al. Corpus description of the ESTER Evaluation Campaign for the Rich Transcription of French Broadcast News , 2004, LREC.

[9] Guillaume Gravier,et al. The ESTER phase II evaluation campaign for the rich transcription of French broadcast news , 2005, INTERSPEECH.

[10] Thomas Hain,et al. Segmentation and classification of broadcast news audio , 1998, ICSLP.

[11] Jean-Luc Gauvain,et al. Feature and score normalization for speaker verification of cellular data , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[12] Jitendra Ajmera,et al. A robust speaker clustering algorithm , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[13] Sridha Sridharan,et al. Feature warping for robust speaker verification , 2001, Odyssey.

[14] Mauro Cettolo. Segmentation, classification and clustering of an Italian broadcast news corpus , 2000 .

[15] Mauro Cettolo,et al. Evaluation of BIC-based algorithms for audio segmentation , 2005, Comput. Speech Lang..

[16] Jean-Luc Gauvain,et al. Combining speaker identification and BIC for speaker diarization , 2005, INTERSPEECH.

[17] John Makhoul,et al. THE 2004 BBN/LIMSI 10xRT ENGLISH BROADCAST NEWS TRANSCRIPTION SYSTEM , 2004 .

[18] Chin-Hui Lee,et al. Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[19] Christian Wellekens,et al. DISTBIC: A speaker-based segmentation for audio data indexing , 2000, Speech Commun..

[20] S. Chen,et al. Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[21] Jean-Luc Gauvain,et al. Partitioning and transcription of broadcast news data , 1998, ICSLP.

[22] Gunnar Evermann,et al. An investigation into the the interactions between speaker diarisation systems and automatic speech transcription , 2003 .

[23] Mark J. F. Gales,et al. The Cambridge University March 2005 speaker diarisation system , 2005, INTERSPEECH.

[24] Jean-Luc Gauvain,et al. Audio Partitioning and Transcription for Broadcast Data Indexation , 2004, Multimedia Tools and Applications.

[25] Jean-Luc Gauvain,et al. The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[26] Jean-Claude Junqua,et al. Towards domain independent speaker clustering , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[27] Jean-Luc Gauvain,et al. Speaker diarization from speech transcripts , 2004, INTERSPEECH.