论文信息 - Speaker diarization for meeting room audio

Speaker diarization for meeting room audio

This paper describes a speaker diarization system in 2007 NIST Rich Transcription (RT07) Meeting Recognition Evaluation for the task of Multiple Distant Microphone (MDM) in meeting room scenarios. The system includes three major modules: data preparation, initial speaker clustering and cluster purification/merging. The data preparation consists of the raw data Wiener filtering and beamforming, Time Difference of Arrival estimate and speech activity detection. Based on the initial processed data, two-stage histogram quantization has been used to perform the initial speaker clustering. A modified purification strategy via high-order GMM clustering method is proposed. BIC criterion is applied for cluster merging. The system achieves a competitive overall DER of 8.31% for RT07 MDM speaker diarization task.

Bin Ma | Haizhou Li | Tin Lay Nwe | Hanwu Sun

[1] Michael S. Brandstein,et al. A robust method for speech signal time-delay estimation in reverberant rooms , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2] David A. van Leeuwen,et al. The AMI Speaker Diarization System for NIST RT06s Meeting Data , 2006, MLMI.

[3] X. Anguera,et al. Speaker diarization for multi-party meetings using acoustic fusion , 2005, IEEE Workshop on Automatic Speech Recognition and Understanding, 2005..

[4] Haizhou Li,et al. Speaker diarization in meeting audio , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5] Douglas A. Reynolds,et al. Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[6] Hynek Hermansky,et al. Qualcomm-ICSI-OGI features for ASR , 2002, INTERSPEECH.

[7] Marijn Huijbregts,et al. The ICSI RT07s Speaker Diarization System , 2007, CLEAR.

[8] Dan Istrate,et al. NIST RT'05S Evaluation: Pre-processing Techniques and Speaker Diarization on Multiple Microphone Meetings , 2005, MLMI.

[9] Bin Ma,et al. Speaker Diarization Using Direction of Arrival Estimate and Acoustic Feature Information: The I2R-NTU Submission for the NIST RT 2007 Evaluation , 2007, CLEAR.

[10] Jill P. Mesirov,et al. Consensus Clustering: A Resampling-Based Method for Class Discovery and Visualization of Gene Expression Microarray Data , 2003, Machine Learning.