A discriminative training algorithm for Gaussian mixture speaker models

The Gaussian mixture speaker model (GMM) is usually trained with the expectation-maximization (EM) algorithm to maximize the likelihood (ML) of observation data from an individual class. The GMM trained based the ML criterion has weak discriminative power when used as a classifier. In this paper, a discriminative training procedure is proposed to fine-tune the parameters in the GMMs. The goal of the training is to reduce the number of misclassified vector groups. Since a vector group can be thought as derived from a short sentence, this training procedure optimize the speaker identification performance more directly. Even though the algorithm itself is based on an heuristic idea, it works fine for many practical problems. Besides, the training speed is very fast. In an evaluation experiment with the YOHO database, when each speaker is modeled with 8 mixtures, the identification rate increases from 83.8% to 92.4% after applying this discriminative training algorithm.

[1]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[2]  Günther Palm,et al.  A new codebook training algorithm for VQ-based speaker recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Haizhou Li,et al.  On MMI learning of Gaussian mixture for speaker models , 1995, EUROSPEECH.