Robust speaker recognition against background noise in an enhanced multi-condition domain

In the midst of background noise environments, the performance of speaker recognition (SR) systems is considerably degraded. To estimate the model mismatch between training and evaluation data, we also propose an intra Kullback-Leibler distance (intra-KLD) measure. Based on the intra-KLD, the performance of SR systems using speech enhancement (SE) and multi-condition (MC) training can be predicted with reduced computational complexity. Since SE cannot fully remove real-world noise without modifying the clean speech signal, the SR model trained only with a clean speech signal cannot fully represent the evaluation data that include various noisy signals preprocessed by SE. To compensate for this problem, we apply SE as a preprocessing block not only for the evaluation stage, but for the training stage. Moreover, we propose to combine SE and MC training (SE-MC) where various sets of features are extracted in an SE domain and a model for each speaker is trained based on the mixture of SE-domain features. Under various background noise environments, SE, MC, and SE-MC produced SR error rates of 43.51%, 25.00%, and 20.29%, respectively.

[1]  E. A. Martin,et al.  Multi-style training for robust isolated-word speech recognition , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Sukmoon Chang,et al.  A voice trigger system using keyword and speaker recognition for mobile devices , 2009, IEEE Transactions on Consumer Electronics.

[3]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[4]  Søren Vang Andersen,et al.  Speech Enhancement with Natural Sounding Residual Noise Based on Connected Time-Frequency Speech Presence Regions , 2005, EURASIP J. Adv. Signal Process..

[5]  W. Bastiaan Kleijn,et al.  Codebook driven short-term predictor parameter estimation for speech enhancement , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  James R. Glass,et al.  Robust Speaker Recognition in Noisy Conditions , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[8]  Hong Kook Kim,et al.  Cepstrum-domain acoustic feature compensation based on decomposition of speech and noise for ASR in noisy environments , 2001, IEEE Trans. Speech Audio Process..

[9]  Rainer Martin,et al.  Noise power spectral density estimation based on optimal smoothing and minimum statistics , 2001, IEEE Trans. Speech Audio Process..

[10]  Ji Ming Noise compensation for speech recognition with arbitrary additive noise , 2004 .

[11]  A. Cuhadar,et al.  Evaluation of Speech Enhancement Techniques for Speaker Identification in Noisy Environments , 2007, Ninth IEEE International Symposium on Multimedia Workshops (ISMW 2007).

[12]  Hugo Van hamme,et al.  Application of Minimum Statistics and Minima Controlled Recursive Averaging Methods to Estimate a Cepstral Noise Model for Robust ASR , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[13]  Ho-Sub Yoon,et al.  Text-Independent Speaker Identification using Soft Channel Selection in Home Robot Environments , 2008, IEEE Transactions on Consumer Electronics.

[14]  Israel Cohen,et al.  Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging , 2003, IEEE Trans. Speech Audio Process..

[15]  Hans-Günter Hirsch,et al.  Noise estimation techniques for robust speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.