论文信息 - Advances in channel compensation for SVM speaker recognition

Advances in channel compensation for SVM speaker recognition

Cross-channel degradation is one of the significant challenges facing speaker recognition systems. We study the problem for speaker recognition using support vector machines (SVMs). We perform channel compensation in SVM modeling by removing non-speaker nuisance dimensions in the SVM expansion space via projections. Training to remove these dimensions is accomplished via an eigenvalue problem. The eigenvalue problem attempts to reduce multisession variation for the same speaker, reduce different channel effects, and increase "distance" between different speakers. We apply our methods to a subset of the Switchboard 2 corpus. Experiments show dramatic improvement in performance for the cross-channel case.

[1] Patrick Kenny,et al. Experiments in speaker verification using factor analysis likelihood ratios , 2004, Odyssey.

[2] William M. Campbell,et al. Generalized linear discriminant sequence kernels for speaker recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3] Larry P. Heck,et al. A model-based transformational approach to robust speaker recognition , 2000, INTERSPEECH.

[4] Douglas A. Reynolds,et al. Channel robust speaker verification via feature mapping , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[5] Nello Cristianini,et al. An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[6] William M. Campbell,et al. Channel compensation for SVM speaker recognition , 2004, Odyssey.

[7] Roland Auckenthaler,et al. Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[8] Alvin F. Martin,et al. The NIST speaker recognition evaluation program , 2005 .

[9] Aaron E. Rosenberg,et al. A fast algorithm for stochastic matching with application to robust speaker verification , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.