Text Driven Face-Video Synthesis Using GMM and Spatial Correlation

Liveness detection is increasingly planned to be incorporated into biometric systems to reduce the risk of spoofing and impersonation. Some of the techniques used include detection of motion of the head while posing/speaking, iris size in varying illumination, fingerprint sweat, text-prompted speech, speech-to-lip motion synchronization etc. In this paper, we propose to build a biometric signal to test attack resilience of biometric systems by creating a text-driven video synthesis of faces. We synthesize new realistic looking video sequences from real image sequences representing utterance of digits. We determine the image sequences for each digit by using a GMM based speech recognizer. Then, depending on system prompt (sequence of digits) our method regenerates a video signal to test attack resilience of a biometric system that asks for random digit utterances to prevent play-back of pre-recorded data representing both audio and images. The discontinuities in the new image sequence, created at the connection of each digit, are removed by using a frame prediction algorithm that makes use of the well known block matching algorithm. Other uses of our results include web-based video communication for electronic commerce and frame interpolation for low frame rate video.

[1]  Sharath Pankanti,et al.  Biometrics: a grand challenge , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[2]  Arun Ross,et al.  An introduction to biometric recognition , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[3]  Chandrika Kamath,et al.  Block Matching for Object Tracking , 2003 .

[4]  Stephanie Schuckers,et al.  Spoofing and Anti-Spoofing Measures , 2002, Inf. Secur. Tech. Rep..

[5]  Marcos Faúndez-Zanuy,et al.  Biometric security technology , 2006, IEEE Aerospace and Electronic Systems Magazine.

[6]  Tieniu Tan,et al.  Live face detection based on the analysis of Fourier spectra , 2004, SPIE Defense + Commercial Sensing.

[7]  Josef Bigün Vision with direction - a systematic introduction to image processing and computer vision , 2006 .

[8]  J. Bigun,et al.  Assuring liveness in biometric identity authentication by real-time face tracking , 2004, Proceedings of the 2004 IEEE International Conference on Computational Intelligence for Homeland Security and Personal Safety, 2004. CIHSPS 2004..

[9]  Shing-Chow Chan,et al.  Fast block matching algorithms for motion estimation , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[10]  Jiri Matas,et al.  XM2VTSDB: The Extended M2VTS Database , 1999 .

[11]  W.D. Pan,et al.  A tutorial on using hidden Markov models for phoneme recognition , 2005, Proceedings of the Thirty-Seventh Southeastern Symposium on System Theory, 2005. SSST '05..

[12]  D. Reynolds,et al.  Authentication gets personal with biometrics , 2004, IEEE Signal Processing Magazine.

[13]  Josef Bigün,et al.  Person Verification by Lip-Motion , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[14]  Steve Young,et al.  The HTK book , 1995 .

[15]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[16]  Josef Bigün,et al.  Evaluating liveness by face images and the structure tensor , 2005, Fourth IEEE Workshop on Automatic Identification Advanced Technologies (AutoID'05).

[17]  Sherif G. Aly,et al.  Real-time motion-based frame estimation in video lossy transmission , 2001, Proceedings 2001 Symposium on Applications and the Internet.

[18]  Anil K. Jain,et al.  Displacement Measurement and Its Application in Interframe Image Coding , 1981, IEEE Trans. Commun..

[19]  Nalini K. Ratha,et al.  Enhancing security and privacy in biometrics-based authentication systems , 2001, IBM Syst. J..