A Multi-Model Method for Short-Utterance Speaker Recognition

The length of the test speech greatly influences the performance of GMM-UBM based text-independent speaker recognition system, for example when the length of valid speech is as short as 1~5 seconds, the performance decreases significantly because the GMM-UBM based speaker recognition method is a statistical one, of which sufficient data is the foundation. Considering that the use of text information will be helpful to speaker recognition, a multi-model method is proposed to improve short-utterance speaker recognition (SUSR) in Chinese. We build a few phoneme class models for each speaker to represent different parts of the characteristic space and fuse the scores to fit the test data on the models with the purpose of increasing the matching degree between training models and test utterance. Experimental results showed that the proposed method achieved a relative EER reduction of about 26% compared with the traditional GMM-UBM method.

[1]  Sridha Sridharan,et al.  Factor analysis subspace estimation for speaker verification with short utterances , 2008, INTERSPEECH.

[2]  Sridha Sridharan,et al.  Making Confident Speaker Verification Decisions With Minimal Speech , 2010, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[5]  Jeff A. Bilmes,et al.  A gentle tutorial of the em algorithm and its application to parameter estimation for Gaussian mixture and hidden Markov models , 1998 .

[6]  Thomas Fang Zheng,et al.  Creation of Time-Varying Voiceprint Database , 2010 .

[7]  P. Sivakumaran,et al.  On the enhancement of speaker identification accuracy using weighted bilateral scoring , 2008, 2008 42nd Annual IEEE International Carnahan Conference on Security Technology.

[8]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[9]  Wang Dong Multi-Layer Channel Normalization for Frequency-Dynamic Feature Extraction , 2003 .

[10]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[11]  Shrikanth S. Narayanan,et al.  Robust speaker identification based on selective use of feature vectors , 2007, Pattern Recognit. Lett..

[12]  Jr. J.P. Campbell,et al.  Speaker recognition: a tutorial , 1997, Proc. IEEE.

[13]  Eliathamby Ambikairajah,et al.  A segment selection technique for speaker verification , 2010, Speech Commun..