Real Time Lip Motion Analysis for a Person Authentication System using Near Infrared Illumination

In this paper we present an approach for lip motion analysis that can be used in conjunction with a person authentication system based on face recognition, to avoid attacks on the system using passive photographs. This work focuses on robustly tracking lips in gray scale images, which may be captured in the visible light or near infrared spectrum. We present an approach for locating the two lip corners in a face image. Then we extract suitable features from the mouth region to classify mouth states (visemes). The system shows a classification accuracy of above 85%. The temporal changes in the detected viseme classes can be used for detecting the imposter.

[1]  S. Zucker Computational and Psychophysical Experiments in Grouping: Early Orientation Selection , 1983 .

[2]  Alexander H. Waibel,et al.  See Me, Hear Me: Integrating Automatic Speech Recognition and Lip-reading , 1994 .

[3]  Alex Pentland,et al.  3D lip shapes from video: A combined physical-statistical model , 1998, Speech Commun..

[4]  Kenji Kurosu,et al.  Neural network vowel-recognition jointly using voice features and mouth shape image , 1991, Pattern Recognit..

[5]  Alexander H. Waibel,et al.  Improving connected letter recognition by lipreading , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Charles C. Broun,et al.  Using lip features for multimodal speaker verification , 2001, Odyssey.

[7]  Giridharan Iyengar,et al.  A cascade image transform for speaker independent automatic speechreading , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[8]  Jenq-Neng Hwang,et al.  Lipreading from color video , 1997, IEEE Trans. Image Process..

[9]  Gerasimos Potamianos,et al.  An image transform approach for HMM based automatic lipreading , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[10]  S. Nishida Speech recognition enhancement by lip information , 1986, CHI '86.

[11]  Zhilin Wu,et al.  Inner lip feature extraction for MPEG-4 facial animation , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Chalapathy Neti,et al.  Improved ROI and within frame discriminant features for lipreading , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[13]  Ioannis Pitas,et al.  Recent advances in biometric person authentication , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Sridha Sridharan,et al.  Initialised eigenlip estimator for fast lip tracking using linear regression , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[15]  Yochai Konig,et al.  "Eigenlips" for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[16]  Ian T. Nabney,et al.  Netlab: Algorithms for Pattern Recognition , 2002 .

[17]  Anil K. Jain,et al.  Biometrics: Promising Frontiers for Emerging Identiication Market , 1999 .

[18]  B. Barsky,et al.  An Introduction to Splines for Use in Computer Graphics and Geometric Modeling , 1987 .

[19]  Nalini K. Ratha,et al.  Automated Biometrics , 2001, ICAPR.

[20]  Jiri Matas,et al.  Statistical Chromaticity Models for Lip Tracking with B-splines , 1997, AVBPA.

[21]  Eric D. Petajan Automatic lipreading to enhance speech recognition , 1984 .

[22]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[23]  King-Sun Fu,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Publication Information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Sridha Sridharan,et al.  Improved Facial-Feature Detection for AVSP via Unsupervised Clustering and Discriminant Analysis , 2003, EURASIP J. Adv. Signal Process..

[25]  Eun-Jung Holden,et al.  Visual Speech Recognition Using Cepstral Images , 2000 .

[26]  Thomas S. Huang,et al.  Real-time lip tracking and bimodal continuous speech recognition , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).

[27]  Andrew W. Senior,et al.  Face and Feature Finding for a Face Recognition System , 1999 .

[28]  Alexander H. Waibel,et al.  Towards Unrestricted Lip Reading , 2000, Int. J. Pattern Recognit. Artif. Intell..

[29]  Juergen Luettin,et al.  Audio-Visual Speech Modeling for Continuous Speech Recognition , 2000, IEEE Trans. Multim..

[30]  Juergen Luettin,et al.  Audio-Visual Automatic Speech Recognition: An Overview , 2004 .

[31]  Aleksandra Mojsilovic,et al.  Capturing image semantics with low-level descriptors , 2001, Proceedings 2001 International Conference on Image Processing (Cat. No.01CH37205).

[32]  Wing Hong Lau,et al.  Person authentication using ASM based lip shape and intensity information , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[33]  Andrew W. Moore,et al.  K-means and Hierarchical Clustering , 2004 .

[34]  Hervé Glotin,et al.  Large-vocabulary audio-visual speech recognition: a summary of the Johns Hopkins Summer 2000 Workshop , 2001, 2001 IEEE Fourth Workshop on Multimedia Signal Processing (Cat. No.01TH8564).

[35]  Timothy F. Cootes,et al.  Training Models of Shape from Sets of Examples , 1992, BMVC.

[36]  Timothy F. Cootes,et al.  Active Shape Models - 'smart snakes' , 1992, BMVC.

[37]  Shu Hung Leung,et al.  Lip image segmentation using fuzzy clustering incorporating an elliptic shape function , 2004, IEEE Transactions on Image Processing.

[38]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[39]  Takaaki Hasegawa,et al.  The Image Input Microphone - A New Nonacoustic Speech Communication System by Media Conversion from Oral Motion Images to Speech , 1995, IEEE J. Sel. Areas Commun..

[40]  Islam Shdaifat Design of a visual front end for audio-visual speech recognition , 2005 .

[41]  A. Witkin,et al.  On the Role of Structure in Vision , 1983 .

[42]  Eric David Petajan,et al.  Automatic Lipreading to Enhance Speech Recognition (Speech Reading) , 1984 .

[43]  Michael Vogt Interpreted multi-state lip models for audio-visual speech recognition , 1997, AVSP.

[44]  E. Petajan,et al.  An improved automatic lipreading system to enhance speech recognition , 1988, CHI '88.

[45]  Farzin Deravi,et al.  A review of speech-based bimodal recognition , 2002, IEEE Trans. Multim..

[46]  Gerasimos Potamianos,et al.  Discriminative training of HMM stream exponents for audio-visual speech recognition , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[47]  Andrew Blake,et al.  Real-Time Lip Tracking for Audio-Visual Speech Recognition Applications , 1996, ECCV.

[48]  Timothy F. Cootes,et al.  Extraction of Visual Features for Lipreading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[49]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[50]  Timothy F. Cootes,et al.  Object Recognition by Flexible Template Matching using Genetic Algorithms , 1992, ECCV.

[51]  Juergen Luettin,et al.  A comparison of model and transform-based visual features for audio-visual LVCSR , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[52]  David G. Stork,et al.  Visionary Speech: Looking Ahead to Practical Speechreading Systems , 1996 .

[53]  Timothy F. Cootes,et al.  Building and using flexible models incorporating grey-level information , 1993, 1993 (4th) International Conference on Computer Vision.

[54]  Klaus J. Kirchberg,et al.  Robust Face Detection Using the Hausdorff Distance , 2001, AVBPA.

[55]  A. Murat Tekalp,et al.  Discriminative lip-motion features for biometric speaker identification , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[56]  Kevin P. Murphy,et al.  Dynamic Bayesian Networks for Audio-Visual Speech Recognition , 2002, EURASIP J. Adv. Signal Process..

[57]  David G. Stork,et al.  Pattern Classification , 1973 .

[58]  Avinash C. Kak,et al.  PCA versus LDA , 2001, IEEE Trans. Pattern Anal. Mach. Intell..

[59]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[60]  Paul A. Griffin,et al.  Statistical Approach to Shape from Shading: Reconstruction of Three-Dimensional Face Surfaces from Single Two-Dimensional Images , 1996, Neural Computation.

[61]  Javier R. Movellan,et al.  Dynamic Features for Visual Speechreading: A Systematic Comparison , 1996, NIPS.

[62]  Michael T. Chan,et al.  HMM-based audio-visual speech recognition integrating geometric- and appearance-based visual features , 2001, 2001 IEEE Fourth Workshop on Multimedia Signal Processing (Cat. No.01TH8564).

[63]  Terrence J. Sejnowski,et al.  Neural network models of sensory integration for improved vowel recognition , 1990, Proc. IEEE.