An integrated top-down/bottom-up approach to speaker diarization

Most speaker diarization systems fit into one of two categories: bottom-up or top-down. Bottom-up systems are the most popular but can sometimes suffer from instability from merging and stopping criteria difficulties. Top-down systems deliver competitive results but are particularly prone to poor model initialization which often leads to large variations in performance. This paper presents a new integrated bottom-up/topdown approach to speaker diarization which aims to harness the strengths of each system and thus to improve performance and stability. In contrast to previous work, here the two systems are fused at the heart of the segmentation and clustering stage. Experimental results show improvements in speaker diarization performance for both meeting and TV-show domain data indicating increased intra and inter-domain stability. On the TVshow data in particular, an average relative improvement of 32% DER is obtained. Index Terms: speaker diarization, speaker segmentation, speaker clustering, system combination, SDM

[1]  Patrick Kenny,et al.  Combining Gaussianized/Non-Gaussianized Features to Improve Speaker Diarization of Telephone Conversations , 2007, IEEE Signal Processing Letters.

[2]  Haizhou Li,et al.  T-test distance and clustering criterion for speaker diarization , 2008, INTERSPEECH.

[3]  Nicholas W. D. Evans,et al.  The LIA RT'07 Speaker Diarization System , 2007, CLEAR.

[4]  Nicholas W. D. Evans,et al.  A multimodal approach to initialisation for top-down speaker diarization of television shows , 2010, 2010 18th European Signal Processing Conference.

[5]  Marijn Huijbregts,et al.  The ICSI RT07s Speaker Diarization System , 2007, CLEAR.

[6]  Fabio Valente,et al.  Combination of agglomerative and sequential clustering for speaker diarization , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Shrikanth S. Narayanan,et al.  Strategies to Improve the Robustness of Agglomerative Hierarchical Clustering Under Data Source Variation for Speaker Diarization , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Jean-François Bonastre,et al.  Step-by-step and integrated approaches in broadcast news speaker diarization , 2006, Comput. Speech Lang..

[9]  Jean-François Bonastre,et al.  ALIZE, a free toolkit for speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[10]  Jean-François Bonastre,et al.  E-HMM approach for learning and adapting sound models for speaker indexing , 2001, Odyssey.

[11]  Nicholas W. D. Evans,et al.  The lia-eurecom RT'09 speaker diarization system: Enhancements in speaker modelling and cluster purification , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.