Research on Intersession Variability Compensation for MLLR-SVM Speaker Recognition

One of the most important challenges in speaker recognition is intersession variability (ISV), primarily cross-channel effects. Recent NIST speaker recognition evaluations (SRE) include a multilingual scenario with training conversations involving multilingual speakers collected in a number of other languages, leading to further performance decline. One important reason for this is that more and more researchers are using phonetic clustering to introduce high level information to improve speaker recognition. But such language dependent methods do not work well in multilingual conditions. In this paper, we study both language and channel mismatch using a support vector machine (SVM) speaker recognition system. Maximum likelihood linear regression (MLLR) transforms adapting a universal background model (UBM) are adopted as features. We first introduce a novel language independent statistical binary-decision tree to reduce multi-language effects, and compare this data-driven approach with a traditional knowledge based one. We also construct a framework for channel compensation using feature-domain latent factor analysis (LFA) and MLLR supervector kernel-based nuisance attribute projection (NAP) in the model-domain. Results on the NIST SRE 2006 1conv4w-1conv4w/mic corpus show significant improvement. We also compare our compensated MLLR-SVM system with state-of-the-art cepstral Gaussian mixture and SVM systems, and combine them for a further improvement.

[1]  William M. Campbell,et al.  Generalized linear discriminant sequence kernels for speaker recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Patrick Kenny,et al.  Factor analysis simplified [speaker verification applications] , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[4]  Joseph P. Campbell,et al.  Gender-dependent phonetic refraction for speaker recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Tsuhan Chen,et al.  Improved speaker verification through probabilistic subspace adaptation , 2003, INTERSPEECH.

[6]  Patrick Kenny,et al.  Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  William M. Campbell,et al.  Advances in channel compensation for SVM speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[8]  Jirí Navrátil,et al.  Spoken language recognition-a step toward multilinguality in speech processing , 2001, IEEE Trans. Speech Audio Process..

[9]  Lalit R. Bahl,et al.  A tree-based statistical language model for natural language speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[10]  Andreas Stolcke,et al.  Speaker Recognition With Session Variability Normalization Based on MLLR Adaptation Transforms , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Andreas Stolcke,et al.  MLLR transforms as features in speaker recognition , 2005, INTERSPEECH.

[12]  Christopher M. Bishop,et al.  Mixtures of Probabilistic Principal Component Analyzers , 1999, Neural Computation.

[13]  Pavel Matejka,et al.  Phonotactic language identification using high quality phoneme recognition , 2005, INTERSPEECH.

[14]  Pietro Laface,et al.  Channel Factors Compensation in Model and Feature Domain for Speaker Recognition , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[15]  Andreas Stolcke,et al.  Improvements in MLLR-Transform-based Speaker Recognition , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[16]  Philip A. Chou,et al.  Optimal Partitioning for Classification and Regression Trees , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[17]  Liu Jia Research on MLLR Based Speaker Recognition Algorithm , 2009 .

[18]  Qin Jin,et al.  Phonetic speaker recognition using maximum-likelihood binary-decision tree models , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[19]  William M. Campbell,et al.  A new kernel for SVM MLLR based speaker recognition , 2007, INTERSPEECH.

[20]  Patrick Kenny,et al.  Improvements in Factor Analysis Based Speaker Verification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[21]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[22]  William M. Campbell,et al.  A multi-class MLLR kernel for SVM speaker recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[23]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[24]  William M. Campbell,et al.  Channel compensation for SVM speaker recognition , 2004, Odyssey.