Nuisance Attribute Projection

Cross-channel degradation is one of the significant challenges facing speaker recognition systems. We study this problem in the support vector machine (SVM) context and nuisance variable compensation in high-dimensional spaces more generally. We present an approach to nuisance variable compensation by removing nuisance attribute-related dimensions in the SVM expansion space via projections. Training to remove these dimensions is accomplished via an eigenvalue problem. The eigenvalue problem attempts to reduce multisession variation for the same speaker, reduce different channel effects, and increase “distance” between different speakers. Experiments show significant improvement in performance for the cross-channel case.

[1]  R. C. Whaley,et al.  Minimizing development and maintenance costs in supporting persistently optimized BLAS , 2005, Softw. Pract. Exp..

[2]  Samy Bengio,et al.  SVMTorch: Support Vector Machines for Large-Scale Regression Problems , 2001, J. Mach. Learn. Res..

[3]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[4]  Chao Yang,et al.  ARPACK users' guide - solution of large-scale eigenvalue problems with implicitly restarted Arnoldi methods , 1998, Software, environments, tools.

[5]  Ed Anderson,et al.  LAPACK Users' Guide , 1995 .

[6]  Larry P. Heck,et al.  Handset-dependent background models for robust text-independent speaker recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Douglas A. Reynolds,et al.  Channel robust speaker verification via feature mapping , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[8]  William M. Campbell,et al.  Fusing discriminative and generative methods for speaker recognition: experiments on switchboard and NFI/TNO field data , 2004, Odyssey.

[9]  Francisco A. M. Gomes,et al.  ARPACK++: A C++ implementation of ARPACK eigenvalue package , 1997 .

[10]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[11]  Andreas Christmann,et al.  Support vector machines , 2008, Data Mining and Knowledge Discovery Handbook.

[12]  Richard J. Mammone,et al.  Speaker recognition using neural networks and conventional classifiers , 1994, IEEE Trans. Speech Audio Process..

[13]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[14]  William M. Campbell,et al.  Advances in channel compensation for SVM speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[15]  Joseph P. Campbell,et al.  Gender-dependent phonetic refraction for speaker recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Lawrence G. Bahler,et al.  Speaker verification using randomized phrase prompting , 1991, Digit. Signal Process..

[17]  William M. Campbell,et al.  Channel compensation for SVM speaker recognition , 2004, Odyssey.

[18]  William M. Campbell,et al.  Generalized linear discriminant sequence kernels for speaker recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Robert H. Halstead,et al.  Matrix Computations , 2011, Encyclopedia of Parallel Computing.

[20]  George R. Doddington,et al.  Speaker recognition based on idiolectal differences between speakers , 2001, INTERSPEECH.

[21]  William M. Campbell,et al.  Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[22]  William M. Campbell,et al.  High-level speaker verification with support vector machines , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[23]  Douglas A. Reynolds,et al.  Fusing high- and low-level features for speaker recognition , 2003, INTERSPEECH.