Lip motion features for biometric person recognition

The present chapter reports on the use of lip motion as a stand alone biometric modality as well as a modality integrated with audio speech for identity recognition using digit recognition as a support. First, the auhtors estimate motion vectors from images of lip movements. The motion is modeled as the distribution of apparent line velocities in the movement of brightness patterns in an image. Then, they construct compact lip-motion features from the regional statistics of the local velocities. These can be used as alone or merged with audio features to recognize identity or the uttered digit. The author's present person recognition results using the XM2VTS database representing the video and audio data of 295 people. Furthermore, we present results on digit recognition when it is used in a text prompted mode to verify the liveness of the user. Such user challenges have the intention to reduce replay attack risks of the audio system.

[1]  Jenq-Neng Hwang,et al.  Lipreading from color video , 1997, IEEE Trans. Image Process..

[2]  Aggelos K. Katsaggelos,et al.  Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features , 2002, EURASIP J. Adv. Signal Process..

[3]  Satoshi Nakamura,et al.  Lip movement synthesis from speech based on hidden Markov models , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[4]  Aggelos K. Katsaggelos,et al.  Audio-Visual Biometrics , 2006, Proceedings of the IEEE.

[5]  Chalapathy Neti,et al.  Recent advances in the automatic recognition of audiovisual speech , 2003, Proc. IEEE.

[6]  Farzin Deravi,et al.  A review of speech-based bimodal recognition , 2002, IEEE Trans. Multim..

[7]  Juergen Luettin,et al.  Speechreading using Probabilistic Models , 1997, Comput. Vis. Image Underst..

[8]  Robert Frischholz,et al.  BioID: A Multimodal Biometric Identification System , 2000, Computer.

[9]  Alex Pentland,et al.  Automatic lipreading by optical-flow analysis , 1989 .

[10]  Thomas Wagner,et al.  SESAM: A biometric person identification system using sensor fusion , 1997, Pattern Recognit. Lett..

[11]  Johan Wiklund,et al.  Multidimensional Orientation Estimation with Applications to Texture Analysis and Optical Flow , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Josef Bigün,et al.  Damascening video databases for evaluation of face tracking and recognition - The DXM2VTS database , 2007, Pattern Recognit. Lett..

[13]  E. Mayoraz,et al.  Fusion of face and speech data for person identity verification , 1999, IEEE Trans. Neural Networks.

[14]  D. Reynolds,et al.  Authentication gets personal with biometrics , 2004, IEEE Signal Processing Magazine.

[15]  Ralph Gross,et al.  Robust Biometric Person Identification Using Automatic Classifier Fusion of Speech, Mouth, and Face Experts , 2007, IEEE Transactions on Multimedia.

[16]  Mark A. Clements,et al.  Automatic Speechreading with Applications to Human-Computer Interfaces , 2002, EURASIP J. Adv. Signal Process..

[17]  Kuldip K. Paliwal,et al.  Identity verification using speech and face information , 2004, Digit. Signal Process..

[18]  Juergen Luettin,et al.  Audio-Visual Speech Modeling for Continuous Speech Recognition , 2000, IEEE Trans. Multim..

[19]  Tsuhan Chen,et al.  Audiovisual speech processing , 2001, IEEE Signal Process. Mag..

[20]  Juergen Luettin,et al.  Audio-Visual Speech Modelling for Continuous Speech Recognition , 2000 .

[21]  Roberto Brunelli,et al.  Person identification using multiple cues , 1995, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Kevin P. Murphy,et al.  Dynamic Bayesian Networks for Audio-Visual Speech Recognition , 2002, EURASIP J. Adv. Signal Process..