Improved speaker diarization system for meetings

In this paper, we investigate new approaches to improve speech activity detection, speaker segmentation and speaker clustering. The main idea behind them is to deal with the problem of speaker diarization for meetings where error rates are relatively high. In opposition to existing methods, a new iterative scheme is proposed considering those three tasks as only one problem. New bidirectional source segmentation is proposed based on the GLR/BIC method. The well-known BIC clustering is also reviewed and a new unsupervised post-processing is added to increase clusters purity. Those new proposals applied on meeting data show a relative improvement of about 40% compared to a standard speaker diarization system.

[1]  Guillaume Gravier,et al.  The ESTER phase II evaluation campaign for the rich transcription of French broadcast news , 2005, INTERSPEECH.

[2]  Jürgen Dodt,et al.  De , 2003, KN - Journal of Cartography and Geographic Information.

[3]  Elie el Khoury,et al.  Speaker Diarization: Towards a More Robust and Portable System , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[4]  Gerald Friedland,et al.  Overlapped speech detection for improved speaker diarization in multiparty meetings , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Malcolm Slaney,et al.  Construction and evaluation of a robust multifeature speech/music discriminator , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Hsin-Min Wang,et al.  Clustering speech utterances by speaker using Eigenvoice-motivated vector space models , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[7]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[8]  Sanjit K. Mitra,et al.  Voice activity detection based on multiple statistical models , 2006, IEEE Transactions on Signal Processing.

[9]  Jean-Luc Gauvain,et al.  Multistage speaker diarization of broadcast news , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Shrikanth S. Narayanan,et al.  A robust stopping criterion for agglomerative hierarchical clustering in a speaker diarization system , 2007, INTERSPEECH.

[11]  Mauro Cettolo,et al.  Efficient audio segmentation algorithms based on the BIC , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[12]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Julien Pinquier,et al.  A fusion study in speech / music classification , 2003, 2003 International Conference on Multimedia and Expo. ICME '03. Proceedings (Cat. No.03TH8698).