Advances in fast multistream diarization based on the information bottleneck framework

Multistream diarization is an effective way to improve the diarization performance, MFCC and Time Delay Of Arrivals (TDOA) being the most commonly used features. This paper extends our previous work on information bottleneck diarization aiming to include large number of features besides MFCC and TDOA while keeping computational costs low. At first HMM/GMM and IB systems are compared in case of two and four feature streams and analysis of errors is performed. Results on a dataset of 17 meetings show that, in spite of comparable oracle performances, the IB system is more robust to feature weight variations. Then a sequential optimization is introduced that further improves the speaker error by 5 − 8% relative. In the last part, computational issues are discussed. The proposed approach is significantly faster and its complexity marginally grows with the number of feature streams running in 0.75 real time even with four streams achieving a speaker error equal to 6%.

[1]  Jitendra Ajmera,et al.  A robust speaker clustering algorithm , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[2]  Fabio Valente,et al.  Multistream speaker diarization beyond two acoustic feature streams , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Xavier Anguera Miró ROBUST SPEAKER DIARIZATION FOR MEETINGS , 2006 .

[4]  Gerald Friedland,et al.  Modulation spectrogram features for improved speaker diarization , 2008, INTERSPEECH.

[5]  Jitendra Ajmera,et al.  Robust audio segmentation , 2004 .

[6]  Fabio Valente,et al.  KL realignment for speaker diarization with multiple feature streams , 2009, INTERSPEECH.

[7]  José Manuel Pardo,et al.  Robust Speaker Diarization for meetings , 2006 .

[8]  Naftali Tishby,et al.  Unsupervised document classification using sequential information maximization , 2002, SIGIR '02.

[9]  Fabio Valente,et al.  An Information Theoretic Approach to Speaker Diarization of Meeting Data , 2009, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Naftali Tishby,et al.  Agglomerative Information Bottleneck , 1999, NIPS.

[11]  Daniel P. W. Ellis,et al.  Frequency-domain linear prediction for temporal features , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[12]  Naftali Tishby,et al.  The information bottleneck method , 2000, ArXiv.

[13]  Xavier Anguera Miró,et al.  Purity Algorithms for Speaker Diarization of Meetings Data , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.