On the convergence of Gaussian mixture models: improvements through vector quantization

This paper studies the reliance of a Gaussian Mixture Model (GMM) based closed-set Speaker Identification system on model convergence and describes methods to improve this convergence. It shows that the reason why the Vector Quantisation GMMs (VQGMMs) outperform a simple GMM is mainly due to decreasing the complexity of the data during training. In addition, it is shown that the VQGMM system is less computationally complex than the traditional GMM, yielding a system which is quicker to train and which gives higher performance. We also investigate four different VQ distance measures which can be used in the training of a VQGMM and compare their respective performances. It is found that the improvements gained by the VQGMM is only marginally dependant on the distance measure.

[1]  Francisco Javier Caminero Gil,et al.  Discriminative training of GMM for speaker identification , 1996, ICASSP.

[2]  Robert J. Schalkoff,et al.  Pattern recognition - statistical, structural and neural approaches , 1991 .

[3]  Ea-Ee Jan,et al.  Selective use of the speech spectrum and a VQGMM method for speaker identification , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  Sridha Sridharan,et al.  Comparison of Four Distance Measures for Long Time Text-Independent Speaker Identification , 1996, Fourth International Symposium on Signal Processing and Its Applications.

[5]  Roberto Togneri,et al.  Using Gaussian mixture modeling in speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.