Speaker verification in noisy environment using GMM supervectors

This paper explores the GMM-SVM combined approach for Text-Independent speaker verification in noisy environment. In recent years supervectors constructed by stacking the means of adapted Gaussian Mixture Models (GMMs) have been used successfully for deriving sequence kernels. Support Vector Machines (SVMs) trained using such kernels provide further improvement in classification accuracy. Analysis of the behavior of such hybrid systems towards simulated noisy data is the object of our study. In our work we have used the KL-divergence and GMM-UBM mean interval kernels for SVM training. All experiments are conducted on NIST-SRE-2003 database with training and test utterances degraded by noises (car, factory & pink) collected from the NOISEX-92 database, at 5dB & 10dB SNRs. A significant improvement of performance is observed in comparison to the traditional GMM-UBM based system.

[1]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[2]  Richard J. Mammone,et al.  Channel-robust speaker identification using modified-mean cepstral mean normalization with frequency warping , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[3]  Steve Renals,et al.  Speaker verification using sequence discriminant support vector machines , 2005, IEEE Transactions on Speech and Audio Processing.

[4]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[5]  James R. Glass,et al.  Robust Speaker Recognition in Noisy Conditions , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  S. R. M. Prasanna,et al.  Significance of Vowel-Like Regions for Speaker Verification Under Degraded Conditions , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  Roland Auckenthaler,et al.  Speaker-centric score normalisation and time pattern analysis for continuous speaker verification , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[8]  Philip C. Woodland,et al.  Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models , 1995, Comput. Speech Lang..

[9]  Haizhou Li,et al.  An SVM Kernel With GMM-Supervector Based on the Bhattacharyya Distance for Speaker Recognition , 2009, IEEE Signal Processing Letters.

[10]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[11]  William M. Campbell,et al.  Channel compensation for SVM speaker recognition , 2004, Odyssey.

[12]  Herman J. M. Steeneken,et al.  Assessment for automatic speech recognition: II. NOISEX-92: A database and an experiment to study the effect of additive noise on speech recognition systems , 1993, Speech Commun..

[13]  Man-Wai Mak,et al.  Utterance partitioning with acoustic vector resampling for GMM-SVM speaker verification , 2011, Speech Commun..

[14]  George R. Doddington,et al.  Speaker recognition based on idiolectal differences between speakers , 2001, INTERSPEECH.

[15]  William M. Campbell,et al.  Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[16]  Mantao Xu,et al.  Classification of Imbalanced Data by Using the SMOTE Algorithm and Locally Linear Embedding , 2006, 2006 8th international Conference on Signal Processing.

[17]  Haizhou Li,et al.  A GMM supervector Kernel with the Bhattacharyya distance for SVM based speaker recognition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[18]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[19]  Patrick Kenny,et al.  Eigenvoice modeling with sparse training data , 2005, IEEE Transactions on Speech and Audio Processing.