论文信息 - Robust Speaker Segmentation for Meetings: The ICSI-SRI Spring 2005 Diarization System

Robust Speaker Segmentation for Meetings: The ICSI-SRI Spring 2005 Diarization System

In this paper we describe the ICSI-SRI entry in the Rich Transcription 2005 Spring Meeting Recognition Evaluation. The current system is based on the ICSI-SRI clustering system for Broadcast News (BN), with extra modules to process the different meetings tasks in which we participated. Our base system uses agglomerative clustering with a modified Bayesian Information Criterion (BIC) measure to determine when to stop merging clusters and to decide which pairs of clusters to merge. This approach does not require any pre-trained models, thus increasing robustness and simplifying the port from BN to the meetings domain. For the meetings domain, we have added several features to our baseline clustering system, including a “purification” module that tries to keep the clusters acoustically homogeneous throughout the clustering process, and a delay&sum beamforming algorithm which enhances signal quality for the multiple distant microphones (MDM) sub-task. In post-evaluation work we further improved the delay&sum algorithm, experimented with a new speech/non-speech detector and proposed a new system for the lecture room environment.

[1] H. Bourlard,et al. Improved Unknown-Multiple Speaker clustering using HMM , 2002 .

[2] Jitendra Ajmera,et al. A robust speaker clustering algorithm , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[3] S. Chen,et al. Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[4] Barbara Peskin,et al. TOWARDS ROBUST SPEAKER SEGMENTATION: THE ICSI-SRI FALL 2004 DIARIZATION SYSTEM , 2004 .

[5] Michael S. Brandstein,et al. A robust method for speech signal time-delay estimation in reverberant rooms , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6] Hans-Günter Hirsch. HMM adaptation for applications in telecommunication , 2001, Speech Commun..

[7] Hervé Bourlard,et al. Unknown-multiple speaker clustering using HMM , 2002, INTERSPEECH.

[8] J. Flanagan,et al. Computer‐steered microphone arrays for sound transduction in large rooms , 1985 .