Overview of compression and packet loss effects in speech biometrics

An overview is presented of compression and packet loss effects in speech biometrics. These new problems appear particularly in recent applications of biometrics over mobile or Internet networks. The influence of speech compression on speaker recognition performance in mobile networks is investigated. In a first experiment, it is found that the use of GSM coding degrades the performance. In a second experiment, the features for the speaker recognition system are calculated directly from the information available in the encoded bit stream. It is found that a low LPC order in GSM coding is responsible for most performance degradations. A speaker recognition system was obtained which is equivalent in performance to the original one which decodes and reanalyses speech before performing recognition. The joint packet loss and compression effects over IP networks are also studied. It is experimentally demonstrated that the adverse effects of packet loss alone are negligible, while the encoding of speech, particularly at a low bit rate, coupled with packet loss, can reduce the verification accuracy considerably.

[1]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[2]  Andrzej Drygajlo,et al.  Speaker verification with noisy GSM quality speech , 1999 .

[3]  Dominique Vaufreydaz,et al.  The effect of speech and audio compression on speech recognition performance , 2001, 2001 IEEE Fourth Workshop on Multimedia Signal Processing (Cat. No.01TH8564).

[4]  Chafic Mokbel,et al.  Solutions for robust recognition over the GSM cellular network , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[5]  Nicholas W. D. Evans,et al.  ASSESSMENT OF SPEAKER VERIFICATION DEGRADATION DUE TO PACKET LOSS IN THE CONTEXT OF WIRELESS MOBILE DEVICES , 2002 .

[6]  Douglas A. Reynolds,et al.  Speaker and language recognition using speech codec parameters , 1999, EUROSPEECH.

[7]  Philip Lockwood,et al.  Evaluation of root-normalised front-end (RN LFCC) for speech recognition in wireless GSM network environments , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[8]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[9]  Donald F. Towsley,et al.  Measurement and modelling of the temporal dependence in packet loss , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[10]  C. Broun,et al.  Distributed speaker recognition using the ETSI distributed speech recognition standard , 2001 .

[11]  Ivan Magrin-Chagnolleau,et al.  Second-order statistical measures for text-independent speaker identification , 1995, Speech Commun..

[12]  Gérard Chollet,et al.  The ELISA Systems for the NIST"99 Evaluation in Speaker Detection and Tracking , 1999 .

[13]  Lou Boves,et al.  Speaker verification with GSM coded telephone speech , 1997, EUROSPEECH.

[14]  Uyless Black Voice Over IP , 2001 .

[15]  J. E. Porter,et al.  Normalizations and selection of speech segments for speaker recognition scoring , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[16]  Frédéric Bimbot,et al.  A Monte-Carlo method for score normalization in Automatic Speaker Verification using Kullback-Leibler distances , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Laurent Besacier,et al.  Recovering of packet loss for Distributed Speech Recognition , 2002, 2002 11th European Signal Processing Conference.

[18]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[19]  Donald F. Towsley,et al.  Adaptive FEC-based error control for Internet telephony , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[20]  Richard M. Stern,et al.  Speech recognition from GSM codec parameters , 1998, ICSLP.

[21]  Guillaume Gravier,et al.  Overview of the 2000-2001 ELISA Consortium research activities , 2001, Odyssey.

[22]  Carmen Peláez-Moreno,et al.  Recognizing voice over IP: a robust front-end for speech recognition on the world wide web , 2001, IEEE Trans. Multim..