Robust speaker identification under noisy conditions using feature compensation and signal to noise ratio estimation

For wireless remote access security, forensics, electronic commerce and surveillance applications, there is a growing need for biometric speaker identification systems to be robust to noise. This paper examines the robustness issue for the case of additive white noise at signal to noise ratios ranging from 0 to 30 dB. A Gaussian mixture model classifier based on adaptation of a universal background model is used. The system is trained on clean speech and tested on clean and noisy speech. To mitigate the performance loss due to mismatched training and testing conditions, five robust features, feature compensation and decision level fusion strategies are used. The feature compensation is based on blind estimation of the signal to noise ratio of the test speech and the selection of an affine transform among a repertoire. A two-way analysis of variance compares the experimental scenarios (benchmark, control and practical) and the individual features/fusion at each signal to noise ratio. The practical scenario is always statistically better than the benchmark and sometimes equivalent to the control scenario.

[1]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[2]  Ning Wang,et al.  Robust Speaker Recognition Using Denoised Vocal Source and Vocal Tract Features , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[3]  Brett Y. Smolenski,et al.  Feature and Signal Enhancement for Robust Speaker Identification of G.729 Decoded Speech , 2012, ICONIP.

[4]  K. Sreenivasa Rao,et al.  Effect of speech coding on speaker identification , 2010, 2010 Annual IEEE India Conference (INDICON).

[5]  R Togneri,et al.  An Overview of Speaker Identification: Accuracy and Robustness Issues , 2011, IEEE Circuits and Systems Magazine.

[6]  James R. Glass,et al.  Robust Speaker Recognition in Noisy Conditions , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7]  B Y Smolenski,et al.  Usable speech processing: a filterless approach in the presence of interference , 2011, IEEE Circuits and Systems Magazine.

[8]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[9]  Ángel M. Gómez,et al.  Recognition of coded speech transmitted over wireless channels , 2006, IEEE Transactions on Wireless Communications.

[10]  William M. Campbell,et al.  Speaker recognition with polynomial classifiers , 2002, IEEE Trans. Speech Audio Process..

[11]  Brett Y. Smolenski,et al.  Blind Determination of the Signal to Noise Ratio of Speech Signals Based on Estimation Combination of Multiple Features , 2006, APCCAS 2006 - 2006 IEEE Asia Pacific Conference on Circuits and Systems.

[12]  Shantanu Chakrabartty,et al.  An Overview of Statistical Pattern Recognition Techniques for Speaker Verification , 2011, IEEE Circuits and Systems Magazine.