Speaker identification using multi-step clustering algorithm with transformation-based GMM

To improve the performance of speaker recognition, the embedded linear transformation is used to integrate both transformation and diagonal-covariance Caussian mixture into a unified framework. In the case, the mixture number of GMM must be fixed in model training. The cluster expectation-maximization (EM) algorithm is a well-known technique in which the mixture number is regarded as an estimated parameter. This paper presents a new model structure that integrates a multi-step cluster algorithm into the estimating process of GMM with the embedded transformation. In the approach, the transformation matrix, the mixture number and model parameters are simultaneously estimated according to a maximum likelihood criterion. The proposed method is demonstrated on a database of three data sessions for text independent speaker identification. The experiments show that this method outperforms the traditional GMM with cluster EM algorithm.

[1]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[2]  Vassilios Diakoloukas,et al.  Maximum likelihood stochastic transformation adaptation for medium and small data sets , 2001, Comput. Speech Lang..

[3]  Chin-Ta Chen,et al.  Speaker identification using hybrid Karhunen-Loeve transform and Gaussian mixture model approach , 2004, Pattern Recognit..

[4]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[5]  Sadaoki Furui,et al.  An Overview of Speaker Recognition Technology , 1996 .

[6]  Grünwald,et al.  Model Selection Based on Minimum Description Length. , 2000, Journal of mathematical psychology.

[7]  Andrej Ljolje The importance of cepstral parameter correlations in speech recognition , 1994, Comput. Speech Lang..

[8]  Q.Y. Hong,et al.  A discriminative training approach for text-independent speaker recognition , 2005, Signal Process..

[9]  Haizhou Li,et al.  On MMI learning of Gaussian mixture for speaker models , 1995, EUROSPEECH.

[10]  Hsiao-Chuan Wang,et al.  Joint estimation of feature transformation parameters and Gaussian mixture model for speaker identification , 1999, Speech Commun..

[11]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[12]  Sam Kwong,et al.  Discriminative training for speaker identification based on maximum model distance algorithm , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.