ASSESSMENT OF SPEAKER VERIFICATION DEGRADATION DUE TO PACKET LOSS IN THE CONTEXT OF WIRELESS MOBILE DEVICES

This paper considers the adverse effects on speaker verification accuracy caused by two independent forms of speech signal degradation common in mobile communications. The two forms are packet loss in the communications system and ambient noise at the wireless device. The effects of these degradations are assessed independently on a common database of 2000 speakers. Baseline verification performances in terms of equal error rates (EER) show negligible degradation until over 75% of test feature vectors are lost. The EER grows from 3% to just 5% when the loss reaches 88%. In contrast, adding a relatively small amount of noise to the test speech (15dB SNR), with otherwise identical experimental conditions, results in a rise in the EER to 36%. In this latter case, simple speech enhancement leads to a reduction in EER to 21%. The main conclusion of this work is that, for speech-based verification, typical packet loss is likely to incur a negligible degradation in accuracy when compared with the degradation that is associated with typical ambient noise conditions.

[1]  Alexander Fischer,et al.  Quantile based noise estimation for spectral subtraction and Wiener filtering , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[2]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[3]  Rhys James Jones,et al.  SpeechDat Cymru: A Large-scale Welsh Telephony Database , 2001 .

[4]  J C Junqua,et al.  The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[5]  Hans-Günter Hirsch,et al.  Noise estimation techniques for robust speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[6]  Darren Pearce,et al.  Enabling new speech driven services for mobile devices: An overview of the ETSI standards activities , 2000 .

[7]  John Mason,et al.  Efficient real-time noise estimation without explicit speech, non-speech detection: an assessment on the AURORA corpus , 2002, 2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628).

[8]  Laurent Besacier,et al.  Recovering of packet loss for Distributed Speech Recognition , 2002, 2002 11th European Signal Processing Conference.

[9]  Ben P. Milner,et al.  Robust speech recognition over IP networks , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[10]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[11]  Douglas A. Reynolds,et al.  A study of computation speed-UPS of the GMM-UBM speaker recognition system , 1999, EUROSPEECH.

[12]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[13]  V. Hardman,et al.  A survey of packet loss recovery techniques for streaming audio , 1998, IEEE Network.

[14]  Jérôme Boudy,et al.  Experiments with a nonlinear spectral subtractor (NSS), Hidden Markov models and the projection, for robust speech recognition in cars , 1991, Speech Commun..