Optimization of GMM training for speaker verification

EM training of GMM often suffers from the existence of local maxima and singularities in the likelihood space. In this paper, we present a new Modified Split-and-Merge EM algorithm (MSMEM) for speaker verification tasks, which performs split-and-merge operations to escape from local maxima and reduce the chances of generating singularities. With two modified criteria to select split-and-merge candidates for speaker verification task, the overall likelihoods of both training and testing data are improved. Furthermore, modified adaptive variance flooring is introduced in the new EM procedure. Experiments on synthetic data show the advantages of MSMEM. Global threshold EER results on a speaker verification task using the TIMIT database confirm the improvement of the system performance.

[1]  B. Everitt,et al.  Finite Mixture Distributions , 1981 .

[2]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[3]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[4]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[5]  Volker Tresp,et al.  Improved Gaussian Mixture Density Estimates Using Bayesian Penalty Terms and Network Averaging , 1995, NIPS.

[6]  Naonori Ueda,et al.  Deterministic annealing EM algorithm , 1998, Neural Networks.

[7]  Geoffrey E. Hinton,et al.  Split and Merge EM Algorithm for Improving Gaussian Mixture Density Estimates , 2000, J. VLSI Signal Process..

[8]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[9]  Dominique Genoud,et al.  An overview of the CAVE project research activities in speaker verification , 2000, Speech Commun..

[10]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[11]  M. Agha,et al.  Finite Mixture Distribution , 1982 .

[12]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[13]  Lawrence G. Bahler,et al.  Speaker verification using randomized phrase prompting , 1991, Digit. Signal Process..