Co-inertia analysis for "liveness" test in audio-visual biometrics

In biometrics, it is crucial to detect impostors and thwart replay attacks. However, few researches have focused yet on the "liveness" verification. This test ensures that biometric cues being acquired are actual measurements from a live person who is present at the time of capture. Here, we propose a speaker independent "liveness" verification method for audio-video identification systems. It uses the correlation that exists between the lip movements and the speech produced. Two data analysis methods are considered to model this statistical link. Finally, according to tests carried out on the XM2VTS database, the best liveness verification EER achieved is 12.5%.

[1]  Alice Caplier,et al.  Accurate and quasi-automatic lip tracking , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Sharath Pankanti,et al.  Biometrics: a grand challenge , 2004, ICPR 2004.

[3]  Sadaoki Furui,et al.  Concatenated phoneme models for text-variable speaker recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  X. Zhang,et al.  Automatic speechreading with application to speaker verification , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[5]  Hani Yehia,et al.  Quantitative association of vocal-tract and facial behavior , 1998, Speech Commun..

[6]  H. Hotelling Relations Between Two Sets of Variates , 1936 .

[7]  Roland Göcke,et al.  Statistical analysis of the relationship between audio and video speech parameters for Australian English , 2003, AVSP.

[8]  S. Dolédec,et al.  Co‐inertia analysis: an alternative method for studying species–environment relationships , 1994 .

[9]  Roberto Brunelli,et al.  Person identification using multiple cues , 1995, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Richard B. Reilly,et al.  Audio-Visual Speaker Identification Based on the Use of Dynamic Audio and Visual Features , 2003, AVBPA.

[11]  Sharath Pankanti,et al.  Biometrics: a grand challenge , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[12]  Malcolm Slaney,et al.  FaceSync: A Linear Operator for Measuring Synchronization of Video Facial Images and Audio Tracks , 2000, NIPS.

[13]  Jiri Matas,et al.  XM2VTSDB: The Extended M2VTS Database , 1999 .

[14]  Farzin Deravi,et al.  A review of speech-based bimodal recognition , 2002, IEEE Trans. Multim..