Using MMSE to improve session variability estimation

In this paper, the Session Variability Subspace Projection (SVSP) method based on model compensation for speaker verification was improved using the Minimum Mean Square Error (MMSE) criterion. The issue of SVSP is that the speaker's session-independent supervector is approximated by the average of all his or her session-dependent GMM-supervectors when estimating SVSP matrix. However, the error between them does obviously exist. Our goal is to minimise it using MMSE criterion. Compared with the original SVSP, the proposed method could achieve an error rate reduction of 6.7% for EER and 5.3% for minimum detection cost function over the NIST SRE 2006 1C4W-dataset.

[1]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[2]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[3]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[4]  Patrick Kenny,et al.  Eigenvoice modeling with sparse training data , 2005, IEEE Transactions on Speech and Audio Processing.

[5]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[6]  Patrick Kenny,et al.  The role of speaker factors in the NIST extended data task , 2008, Odyssey.

[7]  D. A. Reynolds,et al.  The effects of handset variability on speaker recognition performance: experiments on the Switchboard corpus , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[8]  Douglas A. Reynolds,et al.  Channel robust speaker verification via feature mapping , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[9]  Thomas Fang Zheng,et al.  Session Variability Subspace Projection Based Model Compensation for Speaker Verification , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[10]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11]  Thomas Fang Zheng,et al.  A Cohort-Based Speaker Model Synthesis for Mismatched Channels in Speaker Verification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[13]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[14]  Larry P. Heck,et al.  A model-based transformational approach to robust speaker recognition , 2000, INTERSPEECH.

[15]  Sridha Sridharan,et al.  Experiments in Session Variability Modelling for Speaker Verification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[16]  J. E. Porter,et al.  Normalizations and selection of speech segments for speaker recognition scoring , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.