Detecting Replay Attacks in Audiovisual Identity Verification

We describe an algorithm that detects a lack of correspondence between speech and lip motion by detecting and monitoring the degree of synchrony between live audio and visual signals. It is simple, effective, and computationally inexpensive; providing a useful degree of robustness against basic replay attacks and against speech or image forgeries. The method is based on a cross-correlation analysis between two streams of features, one from the audio signal and the other from the image sequence. We argue that such an algorithm forms an effective first barrier against several kinds of replay attack that would defeat existing verification systems based on standard multimodal fusion techniques. In order to provide an evaluation mechanism for the new technique we have augmented the protocols that accompany the BANCA multimedia corpus by defining new scenarios. We obtain 0% equal-error rate (EER) on the simplest scenario and 35% on a more challenging one

[1]  E. Mayoraz,et al.  Fusion of face and speech data for person identity verification , 1999, IEEE Trans. Neural Networks.

[2]  J. van Leeuwen,et al.  Audio- and Video-Based Biometric Person Authentication , 2001, Lecture Notes in Computer Science.

[3]  Martin Heckmann,et al.  Noise Adaptive Stream Weighting in Audio-Visual Speech Recognition , 2002, EURASIP J. Adv. Signal Process..

[4]  Gérard Chollet,et al.  BIOMET: A Multimodal Person Authentication Database Including Face, Voice, Fingerprint, Hand and Signature Modalities , 2003, AVBPA.

[5]  Chalapathy Neti,et al.  Recent advances in the automatic recognition of audiovisual speech , 2003, Proc. IEEE.

[6]  Jean-Philippe Thiran,et al.  The BANCA Database and Evaluation Protocol , 2003, AVBPA.

[7]  Michael Wagner,et al.  "liveness" Verification in Audio-video Authentication , 2004, INTERSPEECH.

[8]  Gérard Chollet,et al.  Voice forgery using ALISP: indexation in a client memory , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[9]  B. Abboud,et al.  Appearance based lip tracking and cloning on speaking faces , 2005, ISPA 2005. Proceedings of the 4th International Symposium on Image and Signal Processing and Analysis, 2005..

[10]  G. Aguilar,et al.  Multimodal biometric system using fingerprint , 2007, 2007 International Conference on Intelligent and Advanced Systems.