Voice Biometrics over the Internet in the Framework of COST Action 275

The emerging field of biometric authentication over the Internet requires both robust person authentication and secure computer network protocols. This paper presents investigations of vocal biometric person authentication over the Internet, both at the protocol and authentication robustness levels. As part of this study, an appropriate client-server architecture for biometrics on the Internet is proposed and implemented. It is shown that the transmission of raw biometric data in this application is likely to result in unacceptably long delays in the process. On the other hand, by using data models (or features), the transmission time can be reduced to an acceptable level. The use of encryption/decryption for enhancing the data security in the proposed client-server link and its effects on the transmission time are also examined. Furthermore, the scope of the investigations includes an analysis of the effects of packet loss and speech coding on speaker verification performance. It is experimentally demonstrated that whilst the adverse effects of packet loss can be negligible, the encoding of speech, particularly at a low bit rate, can reduce the verification accuracy considerably. The paper details the experimental investigations conducted and presents an analysis of the results.

[1]  Donald F. Towsley,et al.  Adaptive FEC-based error control for Internet telephony , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[2]  Ben P. Milner,et al.  Robust speech recognition over IP networks , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[3]  Bruce Schneier,et al.  Description of a New Variable-Length Key, 64-bit Block Cipher (Blowfish) , 1993, FSE.

[4]  Douglas A. Reynolds,et al.  A study of computation speed-UPS of the GMM-UBM speaker recognition system , 1999, EUROSPEECH.

[5]  Dominique Vaufreydaz,et al.  The effect of speech and audio compression on speech recognition performance , 2001, 2001 IEEE Fourth Workshop on Multimedia Signal Processing (Cat. No.01TH8564).

[6]  V. Hardman,et al.  A survey of packet loss recovery techniques for streaming audio , 1998, IEEE Network.

[7]  Alexander Fischer,et al.  Quantile based noise estimation for spectral subtraction and Wiener filtering , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[8]  J. Abbate,et al.  Inventing the Internet , 1999 .

[9]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[10]  Darren Pearce,et al.  Enabling new speech driven services for mobile devices: An overview of the ETSI standards activities , 2000 .

[11]  Ralph Howard,et al.  Data encryption standard , 1987 .

[12]  Jérôme Boudy,et al.  Experiments with a nonlinear spectral subtractor (NSS), Hidden Markov models and the projection, for robust speech recognition in cars , 1991, Speech Commun..

[13]  J C Junqua,et al.  The Lombard reflex and its role on human listeners and automatic speech recognizers. , 1993, The Journal of the Acoustical Society of America.

[14]  Uyless Black Voice Over IP , 2001 .

[15]  J. E. Porter,et al.  Normalizations and selection of speech segments for speaker recognition scoring , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[16]  Donald F. Towsley,et al.  Measurement and modelling of the temporal dependence in packet loss , 1999, IEEE INFOCOM '99. Conference on Computer Communications. Proceedings. Eighteenth Annual Joint Conference of the IEEE Computer and Communications Societies. The Future is Now (Cat. No.99CH36320).

[17]  Richard M. Schwartz,et al.  Enhancement of speech corrupted by acoustic noise , 1979, ICASSP.

[18]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[19]  Aladdin M. Ariyaeeinia,et al.  DATA TRANSMISSION IN BIOMETRICS OVER THE INTERNET , 2002 .

[20]  Gérard Chollet,et al.  The ELISA Systems for the NIST"99 Evaluation in Speaker Detection and Tracking , 1999 .

[21]  Chin-Hui Lee,et al.  Maximum a posteriori estimation for multivariate Gaussian mixture observations of Markov chains , 1994, IEEE Trans. Speech Audio Process..

[22]  Guillaume Gravier,et al.  Overview of the 2000-2001 ELISA Consortium research activities , 2001, Odyssey.

[23]  Hans-Günter Hirsch,et al.  Noise estimation techniques for robust speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[24]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[25]  S. Boll,et al.  Suppression of acoustic noise in speech using spectral subtraction , 1979 .

[26]  Rhys James Jones,et al.  SpeechDat Cymru: A Large-scale Welsh Telephony Database , 2001 .

[27]  John Mason,et al.  Efficient real-time noise estimation without explicit speech, non-speech detection: an assessment on the AURORA corpus , 2002, 2002 14th International Conference on Digital Signal Processing Proceedings. DSP 2002 (Cat. No.02TH8628).

[28]  Frédéric Bimbot,et al.  A Monte-Carlo method for score normalization in Automatic Speaker Verification using Kullback-Leibler distances , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[29]  Laurent Besacier,et al.  Recovering of packet loss for Distributed Speech Recognition , 2002, 2002 11th European Signal Processing Conference.