Psychoacoustic model compensation with robust feature set for speaker verification in additive noise

This paper addresses the problem of speaker verification in the presence of additive noise for resource deficient languages. Psychoacoustic model compensation (Psy-Comp) has been shown to impart noise robustness to Gaussian Mixture Model (GMM) based speaker verification systems using Mel Frequency Cepstral Coefficients (MFCCs). This work extends the idea of Psy-Comp to incorporate a more robust feature set, which includes Cepstral Mean Subtraction (CMS) and Δ coefficients along with the MFCCs. We propose a model domain CMS operation following the psychoacoustic compensation for improved performance in additive noise. An advantage of this approach is that it does not require specialized developmental data and hence it may be suitable for resource deficient languages. Experiments conducted with the NIST-2000 database corrupted with real-life street noise show improved performance with the proposed method.

[1]  Lukás Burget,et al.  Analysis of Feature Extraction and Channel Compensation in a GMM Speaker Recognition System , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[3]  Thambipillai Srikanthan,et al.  Psychoacoustic Model Compensation for Robust Speaker Verification in Environmental Noise , 2012, IEEE Transactions on Audio, Speech, and Language Processing.

[4]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Yun Lei,et al.  A noise robust i-vector extractor using vector taylor series for speaker recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Néstor Becerra Yoma,et al.  Speaker verification in noise using a stochastic version of the weighted Viterbi algorithm , 2002, IEEE Trans. Speech Audio Process..

[7]  Yun Lei,et al.  Towards noise-robust speaker recognition using probabilistic linear discriminant analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[9]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[10]  James R. Glass,et al.  Robust Speaker Recognition in Noisy Conditions , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  Martin J. Russell,et al.  Text-dependent speaker verification under noisy conditions using parallel model combination , 2001, 2001 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.01CH37221).