Variational Bayesian speaker diarization of meeting recordings

This paper investigates the use of the Variational Bayesian (VB) framework for speaker diarization of meetings data extending previous related works on Broadcast News audio. VB learning aims at maximizing a bound, known as Free Energy, on the model marginal likelihood and allows joint model learning and model selection according to the same objective function. While the BIC is valid only in the asymptotic limit, the Free Energy is always a valid bound. The paper proposes the use of Free Energy as objective function in speaker diarization. It can be used to select dynamically without any supervision or tuning, elements that typically affect the diarization performance i.e. the inferred number of speakers, the size of the GMM and the initialization. The proposed approach is compared with a conventional state-of-the-art system on the RT06 evaluation data for meeting recordings diarization and shows an improvement of 8.4% relative in terms of speaker error.

[1]  David Mackay,et al.  Probable networks and plausible predictions - a review of practical Bayesian methods for supervised neural networks , 1995 .

[2]  Hagai Attias,et al.  A Variational Bayesian Framework for Graphical Models , 1999 .

[3]  Fabio Valente,et al.  Variational Bayesian Methods for Audio Indexing , 2005, MLMI.

[4]  David J. C. MacKay,et al.  Developments in Probabilistic Modelling with Neural Networks - Ensemble Learning , 1995, SNN Symposium on Neural Networks.

[5]  Ramesh A. Gopinath,et al.  Improved speaker segmentation and segments clustering using the bayesian information criterion , 1999, EUROSPEECH.

[6]  S. Chen,et al.  Speaker, Environment and Channel Change Detection and Clustering via the Bayesian Information Criterion , 1998 .

[7]  Jitendra Ajmera,et al.  A robust speaker clustering algorithm , 2003, 2003 IEEE Workshop on Automatic Speech Recognition and Understanding (IEEE Cat. No.03EX721).

[8]  Douglas A. Reynolds,et al.  A study of new approaches to speaker diarization , 2009, INTERSPEECH.

[9]  Hervé Bourlard,et al.  Robust speaker change detection , 2004, IEEE Signal Processing Letters.

[10]  Fabio Valente,et al.  Variational Bayesian speaker clustering , 2004, Odyssey.

[11]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Zoubin Ghahramani,et al.  Propagation Algorithms for Variational Bayesian Learning , 2000, NIPS.