Mismatch modeling and compensation for robust speaker verification

In this study, primary channel mismatch scenario between enrollment and test conditions in a speaker verification task are analyzed and modeled. A novel Gaussian mixture modeling with a universal background model (GMM-UBM) frame based compensation model related to the mismatch is formulated and evaluated using National Institute of Standards and Technology (NIST) speaker recognition evaluation (SRE) 2008 data, along with a comparison to the well-known eigenchannel model. Proposed compensation method show significant improvement versus an eigenchannel model when only the supervector of the UBM is employed. Here, the supervector of the enrollment speaker model is not included for estimation of the mismatch since it is difficult to obtain the real supervector of the speaker based on the limited 5min, channel dependent speech data only. The proposed mismatch compensation model, therefore show that construction of the supervector obtained from a UBM model can more accurately describe the mismatch between enrollment and test data, resulting in effective classification performance improvement for speaker/speech applications.

[1]  Driss Matrouf,et al.  A straightforward and efficient implementation of the factor analysis model for speaker verification , 2007, INTERSPEECH.

[2]  Pietro Laface,et al.  Compensation of Nuisance Factors for Speaker and Language Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  John H. L. Hansen,et al.  Analysis and compensation of stressed and noisy speech with application to robust automatic recognition , 1988 .

[4]  Sridha Sridharan,et al.  Modelling session variability in text-independent speaker verification , 2005, INTERSPEECH.

[5]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  Patrick Kenny,et al.  Eigenvoice modeling with sparse training data , 2005, IEEE Transactions on Speech and Audio Processing.

[7]  Andreas Stolcke,et al.  MLLR transforms as features in speaker recognition , 2005, INTERSPEECH.

[8]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[9]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[10]  Pedro J. Moreno,et al.  Speech recognition in noisy environments , 1996 .

[11]  Li Deng,et al.  HMM adaptation using vector taylor series for noisy speech recognition , 2000, INTERSPEECH.

[12]  T.F. Quatieri,et al.  The effects of telephone transmission degradations on speaker recognition performance , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[13]  John H. L. Hansen,et al.  Analysis and compensation of speech under stress and noise for environmental robustness in speech recognition , 1996, Speech Commun..

[14]  Mark J. F. Gales,et al.  Robust speech recognition in additive and convolutional noise using parallel model combination , 1995, Comput. Speech Lang..

[15]  Alejandro Acero,et al.  Acoustical and environmental robustness in automatic speech recognition , 1991 .

[16]  Mark J. F. Gales,et al.  Model-based techniques for noise robust speech recognition , 1995 .

[17]  Chong Kwan Un,et al.  Speech recognition in noisy environments using first-order vector Taylor series , 1998, Speech Commun..

[18]  Pietro Laface,et al.  Channel Factors Compensation in Model and Feature Domain for Speaker Recognition , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[19]  Mark J. F. Gales,et al.  Joint uncertainty decoding for noise robust speech recognition , 2005, INTERSPEECH.

[20]  Sridha Sridharan,et al.  Factor analysis subspace estimation for speaker verification with short utterances , 2008, INTERSPEECH.

[21]  Patrick Kenny,et al.  Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  Roger K. Moore,et al.  Hidden Markov model decomposition of speech and noise , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[23]  Patrick Kenny,et al.  Factor analysis simplified [speaker verification applications] , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[24]  Hank Liao,et al.  Uncertainty decoding for noise robust automatic speech recognition , 2004 .

[25]  Lukás Burget,et al.  Analysis of Feature Extraction and Channel Compensation in a GMM Speaker Recognition System , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[26]  Patrick Kenny,et al.  Disentangling speaker and channel effects in speaker verification , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[27]  William M. Campbell,et al.  Advances in channel compensation for SVM speaker recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[28]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.