Bayesian speaker recognition using Gaussian mixture model and laplace approximation

This paper presents a Bayesian approach for Gaussian mixture model (GMM)-based speaker identification. Some approaches evaluate the speaker score of a test speech utterance using a single data likelihood over the GMM learned by point estimation methods according to the maximum likelihood or maximum a posteriori criteria. In contrast, the Bayesian approach evaluates the score by using the expectation of the data likelihood over the posterior distribution of the model parameters, which is depicted by Bayesian integration. However, as the integration can not be derived analytically, we apply Laplace approximation to the derivations. Theoretically, we show that the proposed Bayesian approach is equivalent to the GMMUBM approach when infinite training data is available for each speaker. The results of speaker identification experiments on the TIMIT corpus show that the proposed Bayesian approach consistently outperforms the GMM-UBM approach under very limited training data conditions, although the improvement is not very significant.

[1]  Carla Teixeira Lopes,et al.  TIMIT Acoustic-Phonetic Continuous Speech Corpus , 2012 .

[2]  Julian Fiérrez,et al.  Forensic identification reporting using automatic speaker recognition systems , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[3]  Jean-Luc Gauvain,et al.  Multistage speaker diarization of broadcast news , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Aristodemos Pnevmatikakis,et al.  Enhancing the performance of a GMM-based speaker identification system in a multi-microphone setup , 2006, INTERSPEECH.

[5]  Yoshihiko Nankaku,et al.  Speaker recognition based on variational Bayesian method , 2008, INTERSPEECH.

[6]  Aladdin M. Ariyaeeinia,et al.  Open-set speaker identification using adapted Gaussian mixture models , 2005, INTERSPEECH.

[7]  Zhenchun Lei UBM-based sequence kernel for speaker recognition , 2009, INTERSPEECH.

[8]  Stephen Cox,et al.  Some statistical issues in the comparison of speech recognition algorithms , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[9]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.

[10]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[11]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[12]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[13]  Hsin-Min Wang,et al.  BIC-Based Speaker Segmentation Using Divide-and-Conquer Strategies With Application to Speaker Diarization , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[14]  Douglas A. Reynolds,et al.  An overview of automatic speaker diarization systems , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[16]  Longbiao Wang,et al.  High improvement of speaker identification and verification by combining MFCC and phase information , 2009, ICASSP.

[17]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..