SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation

Gaussian mixture models with universal backgrounds (UBMs) have become the standard method for speaker recognition. Typically, a speaker model is constructed by MAP adaptation of the means of the UBM. A GMM supervector is constructed by stacking the means of the adapted mixture components. A recent discovery is that latent factor analysis of this GMM supervector is an effective method for variability compensation. We consider this GMM supervector in the context of support vector machines. We construct a support vector machine kernel using the GMM supervector. We show similarities based on this kernel between the method of SVM nuisance attribute projection (NAP) and the recent results in latent factor analysis. Experiments on a NIST SRE 2005 corpus demonstrate the effectiveness of the new technique

[1]  Sridha Sridharan,et al.  Modelling session variability in text-independent speaker verification , 2005, INTERSPEECH.

[2]  Douglas E. Sturim,et al.  Speaker adaptive cohort selection for Tnorm in text-independent speaker verification , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[3]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[4]  Nello Cristianini,et al.  Support vector machines , 2009 .

[5]  Patrick Kenny,et al.  Experiments in speaker verification using factor analysis likelihood ratios , 2004, Odyssey.

[6]  William M. Campbell,et al.  Generalized linear discriminant sequence kernels for speaker recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Steve Renals,et al.  SVMSVM: support vector machine speaker verification methodology , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[8]  M. Do Fast approximation of Kullback-Leibler distance for dependence trees and hidden Markov models , 2003, IEEE Signal Processing Letters.

[9]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[10]  William M. Campbell,et al.  Advances in channel compensation for SVM speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[11]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[12]  Frédéric Bimbot,et al.  Speaker diarization using bottom-up clustering based on a parameter-derived distance between adapted GMMs , 2004, INTERSPEECH.

[13]  I. Ntroduction The NIST Year 2005 Speaker Recognition Evaluation Plan 1 , .

[14]  Nuno Vasconcelos,et al.  A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications , 2003, NIPS.

[15]  Alvin F. Martin,et al.  The NIST speaker recognition evaluation program , 2005 .

[16]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.