Source counting in speech mixtures by nonparametric Bayesian estimation of an infinite Gaussian mixture model

In this paper we present a source counting algorithm to determine the number of speakers in a speech mixture. In our proposed method, we model the histogram of estimated directions of arrival with a non-parametric Bayesian infinite Gaussian mixture model. As an alternative to classical model selection criteria and to avoid specifying the maximum number of mixture components in advance, a Dirichlet process prior is employed over the mixture components. This allows to automatically determine the optimal number of mixture components that most probably model the observations. We demonstrate by experiments that this model outperforms a parametric approach using a finite Gaussian mixture model with a Dirichlet distribution prior over the mixture weights.

[1]  Masakiyo Fujimoto,et al.  Unsupervised non-parametric Bayesian modeling of non-stationary noise for model-based noise suppression , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[3]  Carl E. Rasmussen,et al.  The Infinite Gaussian Mixture Model , 1999, NIPS.

[4]  Reinhold Häb-Umbach,et al.  Source counting in speech mixtures using a variational EM approach for complex WATSON mixture models , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[6]  Hiroshi Sawada,et al.  Blind sparse source separation for unknown number of sources using Gaussian mixture model fitting with Dirichlet prior , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Te-Won Lee,et al.  Blind Speech Separation , 2007, Blind Speech Separation.

[8]  Reinhold Häb-Umbach,et al.  Towards online source counting in speech mixtures applying a variational EM for complex Watson mixture models , 2014, 2014 14th International Workshop on Acoustic Signal Enhancement (IWAENC).

[9]  Jalil Taghia,et al.  A variational Bayes approach to the underdetermined blind source separation with automatic determination of the number of sources , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).