UBM-GMM Driven Discriminative Approach for Speaker Verification

In the past few years, discriminative approaches to perform speaker detection have shown good results and an increasing interest. Among these methods, SVM based systems have lots of advantages, especially their ability to deal with a high dimension feature space. Generative systems such as UBM-GMM systems show the greatest performance among other systems in speaker verification tasks. Combination of generative and discriminative approaches is not a new idea and has been studied several times by mapping a whole speech utterance onto a fixed length vector. This paper presents a straight-forward, cost friendly method to combine the two approaches with the use of a UBM model only to drive the experiment. We show that the use of the TFLLR kernel, while closely related to a reduced form of the Fisher mapping, implies a performance that is close to a standard GMM/UBM based speaker detection system. Moreover, we show that a combination of both outperforms the systems taken independently

[1]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[2]  William M. Campbell,et al.  High-level speaker verification with support vector machines , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  Elizabeth Shriberg,et al.  SVM modeling of "SNERF-grams" for speaker recognition , 2004, INTERSPEECH.

[4]  Jean-François Bonastre,et al.  ALIZE, a free toolkit for speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[5]  Shai Fine,et al.  A hybrid GMM/SVM approach to speaker identification , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[6]  David Haussler,et al.  Exploiting Generative Models in Discriminative Classifiers , 1998, NIPS.

[7]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[8]  I. Ntroduction The NIST Year 2005 Speaker Recognition Evaluation Plan 1 , .

[9]  Steve Renals,et al.  Speaker verification using sequence discriminant support vector machines , 2005, IEEE Transactions on Speech and Audio Processing.

[10]  Alvin F. Martin,et al.  The NIST speaker recognition evaluation program , 2005 .

[11]  Mahesan Niranjan,et al.  Data-dependent kernels in svm classification of speech patterns , 2000, INTERSPEECH.

[12]  William M. Campbell,et al.  Support vector machines for speaker verification and identification , 2000, Neural Networks for Signal Processing X. Proceedings of the 2000 IEEE Signal Processing Society Workshop (Cat. No.00TH8501).

[13]  Douglas A. Reynolds,et al.  A Tutorial on Text-Independent Speaker Verification , 2004, EURASIP J. Adv. Signal Process..

[14]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .