Soft frame margin estimation of Gaussian Mixture Models for speaker recognition with sparse training data

Discriminative Training (DT) methods for acoustic modeling, such as MMI, MCE, and SVM, have been proved effective in speaker recognition. In this paper we propose a DT method for GMM using soft frame margin estimation. Unlike other DT methods such as MMI or MCE, the soft frame margin estimation attempts to enhance the generalization capability of GMM to unseen data in case the mismatch exists between training data and unseen data. We define an objective function which integrates multi-class separation frame margin and loss function, both as functions of GMM likelihoods. We propose to optimize the objective function based on a convex optimization technique, semidefinite programming. As shown in our experimental results, the proposed soft frame margin discriminative training with semidefinite programming optimization (SFME-SDP) is very effective for robust speaker model training when only limited amounts of training data are available.

[1]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[2]  Dong Yu,et al.  Use of incrementally regulated discriminative margins in MCE training for speech recognition , 2006, INTERSPEECH.

[3]  Aaron E. Rosenberg,et al.  Speaker identification using minimum classification error training , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[4]  Brian Kingsbury,et al.  Boosted MMI for model and feature-space discriminative training , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Zhi-Quan Luo,et al.  A convex optimization method for joint mean and variance parameter estimation of large-margin CDHMM , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Shai Fine,et al.  A hybrid GMM/SVM approach to speaker identification , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[7]  Jinyu Li,et al.  Soft margin estimation of hidden Markov model parameters , 2006, INTERSPEECH.

[8]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[9]  Hui Jiang,et al.  Large margin hidden Markov models for speech recognition , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[10]  Yan Yin,et al.  A compact semidefinite programming (SDP) formulation for large margin estimation of HMMS in speech recognition , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[11]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[12]  Yinyu Ye,et al.  DSDP5: Software for Semidefinite Programming , 2005 .

[13]  Herbert Gish,et al.  Speaker identification via support vector classifiers , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[14]  Bin Ma,et al.  Soft margin estimation of Gaussian mixture model parameters for spoken language recognition , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[15]  Lalit R. Bahl,et al.  Maximum mutual information estimation of hidden Markov model parameters for speech recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Yan Yin,et al.  A fast optimization method for large margin estimation of HMMs based on second order cone programming , 2007, INTERSPEECH.

[17]  William M. Campbell,et al.  Support vector machines for speaker verification and identification , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).