Comparison of input and feature space nonlinear kernel nuisance attribute projections for speaker verification

Nuisance attribute projection (NAP) was an effective method to reduce session variability in SVM-based speaker verification systems. As the expanded feature space of nonlinear kernels is usually high or infinite dimensional, it is difficult to find nuisance directions via conventional eigenvalue analysis and to do projection directly in the feature space. In this paper, two different approaches to nonlinear kernel NAP are investigated and compared. In one way, NAP projection is formulated in the expanded feature space and kernel PCA is employed to do kernel eigenvalue analysis. In the second approach, a gradient descent algorithm is proposed to find out projection over input variables. Experimental results on the 2006 NIST SRE corpus show that both kinds of NAP can reduce unwanted variability in nonlinear kernels to improve verification performance; and NAP performed in expanded feature space using kernel PCA obtains slightly better performance than NAP over input variables.

[1]  Patrick Kenny,et al.  Speaker and Session Variability in GMM-Based Speaker Verification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Yuan Dong,et al.  Svm-Based Speaker Verification by Location in the Space of Reference Speakers , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[3]  Pietro Laface,et al.  Channel Factors Compensation in Model and Feature Domain for Speaker Recognition , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[4]  Delphine Charlet,et al.  Speaker recognition by location in the space of reference speakers , 2006, Speech Commun..

[5]  Douglas E. Sturim,et al.  Speaker indexing in large audio databases using anchor models , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).

[6]  Andreas Stolcke,et al.  Within-class covariance normalization for SVM-based speaker recognition , 2006, INTERSPEECH.

[7]  Liang Lu,et al.  Nonlinear kernel nuisance attribute projection for speaker verification , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[9]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[10]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[11]  William M. Campbell,et al.  Advances in channel compensation for SVM speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[12]  W. M. Campbell Compensating for Mismatch in High-Level Speaker Recognition , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[13]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[14]  Douglas A. Reynolds,et al.  Channel robust speaker verification via feature mapping , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..