Lip Analysis for Person recognition

The human face is an attractive biometric identifier and face recognition has certainly improved a lot since its beginnings some three decades ago, but still its application in real world has achieved limited success. In this doctoral dissertation we focus on a local feature of the human face namely the lip and analyse it for its relevance and influence on person recognition. In depth study is carried out with respect to various steps involved, such as detection, evaluation, normalization and the applications of the human lip motion. Initially we present a lip detection algorithm that is based on the fusion of two independent methods. The first method is based on edge detection and the second one on region segmentation, each having distinct characteristics and thus exhibit different strengths and weaknesses. We exploit these strengths by combining the two methods using fusion. Then we present results from extensive testing and evaluation of the detection algorithm on a realistic database. Next we give a comparison of the visual features of lip motion for their relevance to person recognition. For this purpose we extract various geometric and appearance based lip features and compare them using three feature selection measures; Minimal- Redundancy-Maximum-Relevance, Bhattacharya Distance and Mutual Information. Next we extract features which model the behavioural aspect of lip motion during speech and exploit them for person recognition. The behavioural features include static features, such as the normalized length of major/minor axis, coordinates of lip extrema points and dynamic features based on optical flow. These features are used to build client model by Gaussian Mixture Model (GMM) and finally the classification is achieved using a Bayesian decision rule. Recognition results are then presented on a text independent database specifically designed for testing behavioural features that require comparatively more data. Lastly we propose a temporal normalization method to compensate for variation caused by lip motion during speech. Given a group of videos for a person uttering the same sentence multiple times we study the lip motion in one of the videos and select certain key frames as synchronization frames. We then synchronize these frames from the first video with the remaining videos of the same person. Finally all the videos are normalized temporally by interpolation using lip morphing. For evaluation of our normalization algorithm we have devised a spatio-temporal person recognition algorithm that compares normalized and un-normalized videos.

[1]  Timothy F. Cootes,et al.  Locating facial features using genetic algorithms. , 1995 .

[2]  Minoru Fukumi,et al.  Marketing data collection from face images using neural networks , 2005 .

[3]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[4]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Truong Q. Nguyen,et al.  Two subspace methods to discriminate faces and clutters , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[6]  Kiyoharu Aizawa,et al.  Detection and Tracking of Facial Features by Using Edge Pixel Counting and Deformable Circular Template Matching , 1995, IEICE Trans. Inf. Syst..

[7]  Ye-peng Guan,et al.  Automatic extraction of lips based on multi-scale wavelet edge detection , 2008 .

[8]  Alex Pentland,et al.  Coupled hidden Markov models for complex action recognition , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  S. Nishida Speech recognition enhancement by lip information , 1986, CHI '86.

[10]  Raphaël Féraud,et al.  A Constrained Generative Model Applied to Face Detection , 2004, Neural Processing Letters.

[11]  A. Murat Tekalp,et al.  Joint audio-video processing for biometric speaker identification , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[12]  Stephen M. Omohundro,et al.  Nonlinear manifold learning for visual speech recognition , 1995, Proceedings of IEEE International Conference on Computer Vision.

[13]  Sun-Yuan Kung,et al.  Face recognition/detection by probabilistic decision-based neural network , 1997, IEEE Trans. Neural Networks.

[14]  Y. Bennani Probabilistic cooperation of connectionist expect modules: validation on a speaker identification task , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Masaru Tanaka,et al.  Dynamic attention map by Ising model for human face detection , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[16]  Ming-Kuei Hu,et al.  Visual pattern recognition by moment invariants , 1962, IRE Trans. Inf. Theory.

[17]  Aggelos K. Katsaggelos,et al.  Audio-Visual Speech Recognition Using MPEG-4 Compliant Visual Features , 2002, EURASIP J. Adv. Signal Process..

[18]  Ashish Kapoor,et al.  A real-time head nod and shake detector , 2001, PUI '01.

[19]  Andrew Blake,et al.  Real-Time Lip Tracking for Audio-Visual Speech Recognition Applications , 1996, ECCV.

[20]  Thomas S. Huang,et al.  A new approach to integrate audio and visual features of speech , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[21]  Harry Wechsler,et al.  Mixture of experts for classification of gender, ethnic origin, and pose of human faces , 2000, IEEE Trans. Neural Networks Learn. Syst..

[22]  Liang Dong,et al.  Recognition of visual speech elements using adaptively boosted hidden Markov models , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[23]  Steve R. Gunn,et al.  A dual active contour for head boundary extraction , 1994 .

[24]  Steve J. Young,et al.  HMM-based architecture for face identification , 1994, Image Vis. Comput..

[25]  Ulrich Canzler,et al.  Extraction of Non Manual Features for Videobased Sign Language Recognition , 2002, MVA.

[26]  F. Matta Video person recognition strategies using head motion and facial appearance , 2008 .

[27]  Sridha Sridharan,et al.  Robust speaker verification via fusion of speech and lip modalities , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[28]  Mark A. Clements,et al.  Automatic Speechreading with Applications to Human-Computer Interfaces , 2002, EURASIP J. Adv. Signal Process..

[29]  Ren C. Luo,et al.  Multisensor integration and fusion for intelligent machines and systems , 1995 .

[30]  Narendra Ahuja,et al.  Face detection using mixtures of linear subspaces , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[31]  Timothy F. Cootes,et al.  Extraction of Visual Features for Lipreading , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[32]  Kuldip K. Paliwal,et al.  Noise compensation in a person verification system using face and multiple speech feature , 2003, Pattern Recognit..

[33]  Xiaobo Li,et al.  Towards a system for automatic facial feature detection , 1993, Pattern Recognit..

[34]  Tony F. Chan,et al.  Active contours without edges , 2001, IEEE Trans. Image Process..

[35]  James R. Glass,et al.  A segment-based audio-visual speech recognizer: data collection, development, and initial experiments , 2004, ICMI '04.

[36]  Monson H. Hayes,et al.  Face detection and recognition using hidden Markov models , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[37]  X. Zhang,et al.  Automatic speechreading with application to speaker verification , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[38]  John Oglesby Neural models for speaker recognition , 1991 .

[39]  Lawrence G. Bahler,et al.  Voice identification using nearest-neighbor distance measure , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[40]  Peng Lu,et al.  Head Gesture Recognition Based on Bayesian Network , 2005, IbPRIA.

[41]  M. Burl,et al.  Face Localization via Shape Statistics , 1995 .

[42]  Tim Morris,et al.  Facial feature tracking for cursor control , 2006, J. Netw. Comput. Appl..

[43]  Rainer Herpers,et al.  Edge and keypoint detection in facial regions , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[44]  Tomaso A. Poggio,et al.  A Trainable System for Object Detection , 2000, International Journal of Computer Vision.

[45]  Alexander H. Waibel,et al.  See Me, Hear Me: Integrating Automatic Speech Recognition and Lip-reading , 1994 .

[46]  Farzin Deravi,et al.  Feature-level data fusion for bimodal person recognition , 1997 .

[47]  Fabrizio Smeraldi,et al.  Saccadic search with Gabor features applied to eye detection and real-time head tracking , 2000, Image Vis. Comput..

[48]  Ming Liu,et al.  AVICAR: audio-visual speech corpus in a car environment , 2004, INTERSPEECH.

[49]  W. J. Welsh,et al.  Classification of facial features for recognition , 1991, Proceedings. 1991 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[50]  Lloyd A. Smith,et al.  Practical feature subset selection for machine learning , 1998 .

[51]  Yasushi Yagi,et al.  Facial contour extraction model , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[52]  Sridha Sridharan,et al.  Initialised eigenlip estimator for fast lip tracking using linear regression , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[53]  Izhak Shafran,et al.  Robust speech detection and segmentation for real-time ASR applications , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[54]  John Robinson,et al.  A feature space for face image processing , 2000, Proceedings 15th International Conference on Pattern Recognition. ICPR-2000.

[55]  Jiri Matas,et al.  Statistical Chromaticity Models for Lip Tracking with B-splines , 1997, AVBPA.

[56]  Kyu Ho Park,et al.  Automatic human face location in a complex background using motion and color information , 1996, Pattern Recognit..

[57]  Shaogang Gong,et al.  Real-time tracking for an integrated face recognition system , 2007 .

[58]  Lorenzo Torresani,et al.  2D Deformable Models for Visual Speech Analysis , 1996 .

[59]  Alice Caplier,et al.  Accurate and quasi-automatic lip tracking , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[60]  Alexander H. Waibel,et al.  Improving connected letter recognition by lipreading , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[61]  David J. Kriegman,et al.  Online learning of probabilistic appearance manifolds for video-based recognition and tracking , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[62]  M. Turk,et al.  Eigenfaces for Recognition , 1991, Journal of Cognitive Neuroscience.

[63]  Thomas S. Huang,et al.  Natural Mouse-a novel human computer interface , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[64]  Marc Lievin,et al.  Lip motion automatic detection , 1997 .

[65]  Shaogang Gong,et al.  Face Tracking and Pose Representation , 1996, BMVC.

[66]  Jiri Matas,et al.  XM2VTSDB: The Extended M2VTS Database , 1999 .

[67]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[68]  Biing-Hwang Juang,et al.  The use of cohort normalized scores for speaker verification , 1992, ICSLP.

[69]  Paul Duchnowski,et al.  Adaptive bimodal sensor fusion for automatic speechreading , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[70]  Aaron E. Rosenberg,et al.  On the use of instantaneous and transitional spectral information in speaker recognition , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[71]  David G. Stork,et al.  Using deformable templates to infer visual speech dynamics , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[72]  Dinesh Kant Kumar,et al.  Visual Speech Recognition Using Motion Features and Hidden Markov Models , 2007, CAIP.

[73]  Aggelos K. Katsaggelos,et al.  10.8 – Exploiting Visual Information in Automatic Speech Processing , 2005 .

[74]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[75]  David J. Kriegman,et al.  Illumination cones for recognition under variable lighting: faces , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[76]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[77]  Jeffrey F. Cohn,et al.  Robust Lip Tracking by Combining Shape, Color and Motion , 2007 .

[78]  George Wolberg,et al.  Recent advances in image morphing , 1996, Proceedings of CG International '96.

[79]  A. Murat Tekalp,et al.  Discriminative Analysis of Lip Motion Features for Speaker Identification and Speech-Reading , 2006, IEEE Transactions on Image Processing.

[80]  Sridha Sridharan,et al.  An approach to statistical lip modelling for speaker identification via chromatic feature extraction , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[81]  D. L. Hall,et al.  Mathematical Techniques in Multisensor Data Fusion , 1992 .

[82]  Roland Auckenthaler,et al.  Lip signatures for automatic person recognition , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).

[83]  Russell M. Mersereau,et al.  Lip feature extraction towards an automatic speechreading system , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[84]  Sridha Sridharan,et al.  The use of temporal speech and lip information for multi-modal speaker identification via multi-stream HMMs , 2000, 2000 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings (Cat. No.00CH37100).

[85]  Josef Kittler,et al.  Modelling and segmentation of lip area in face images , 2002 .

[86]  Jean-Philippe Thiran,et al.  The BANCA Database and Evaluation Protocol , 2003, AVBPA.

[87]  Yunus Saatci,et al.  Cascaded classification of gender and facial expression using active appearance models , 2006, 7th International Conference on Automatic Face and Gesture Recognition (FGR06).

[88]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[89]  Rama Chellappa,et al.  A method for converting a smiling face to a neutral face with applications to face recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[90]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[91]  Shumeet Baluja,et al.  Boosting Sex Identification Performance , 2005, International Journal of Computer Vision.

[92]  Marcel J. T. Reinders,et al.  Semantic segmentation of videophone image sequences , 1992, Other Conferences.

[93]  Juergen Luettin,et al.  Audio-Visual Speech Modeling for Continuous Speech Recognition , 2000, IEEE Trans. Multim..

[94]  Sushil J. Louis,et al.  Genetic feature subset selection for gender classification: a comparison study , 2002, Sixth IEEE Workshop on Applications of Computer Vision, 2002. (WACV 2002). Proceedings..

[95]  Shaogang Gong,et al.  Tracking colour objects using adaptive mixture models , 1999, Image Vis. Comput..

[96]  P. Jonathon Phillips,et al.  Face recognition vendor test 2002 , 2003, 2003 IEEE International SOI Conference. Proceedings (Cat. No.03CH37443).

[97]  Wing Hong Lau,et al.  Person authentication using ASM based lip shape and intensity information , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[98]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[99]  Biing-Hwang Juang,et al.  A vector quantization approach to speaker recognition , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[100]  Anil K. Jain,et al.  Multimodal Facial Gender and Ethnicity Identification , 2006, ICB.

[101]  Chung-Lin Huang,et al.  Human facial feature extraction for face interpretation and recognition , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[102]  Sridha Sridharan,et al.  Adaptive mouth segmentation using chromatic features , 2002, Pattern Recognit. Lett..

[103]  L Sirovich,et al.  Low-dimensional procedure for the characterization of human faces. , 1987, Journal of the Optical Society of America. A, Optics and image science.

[104]  Hyun-Chul Kim,et al.  Appearance-based gender classification with Gaussian processes , 2006, Pattern Recognit. Lett..

[105]  Xavier Maldague,et al.  Unsupervised Lips Segmentation Based on ROI Optimisation and Parametric Model , 2007, 2007 IEEE International Conference on Image Processing.

[106]  Jerry D. Cavin Advances in distributed sensor technology , 1996, IEEE Parallel & Distributed Technology: Systems & Applications.

[107]  James Llinas,et al.  Handbook of Multisensor Data Fusion , 2001 .

[108]  Thomas S. Huang,et al.  Object detection using hierarchical MRF and MAP estimation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[109]  Ioannis Pitas,et al.  Multimodal decision-level fusion for person authentication , 1999, IEEE Trans. Syst. Man Cybern. Part A.

[110]  Dario Maio,et al.  Real-time face location on gray-scale static images , 2000, Pattern Recognition.

[111]  Haizhou Ai,et al.  Texture-Constrained Shape Prediction for Mouth Contour Extraction and its State Estimation , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[112]  Thomas Vetter,et al.  Face Recognition Based on Fitting a 3D Morphable Model , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[113]  Thomas Wagner,et al.  Multi-sensorial inputs for the identification of persons with synergetic computers , 1994, Proceedings of 1st International Conference on Image Processing.

[114]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[115]  C. Benoît On the Production and the Perception of Audio-Visual Speech by Man and Machine , 1996 .

[116]  Jay M. Naik,et al.  A hybrid HMM-MLP speaker verification algorithm for telephone speech , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[117]  Shaogang Gong,et al.  Audio- and Video-based Biometric Person Authentication , 1997, Lecture Notes in Computer Science.

[118]  R. Vaillant,et al.  Original approach for the localisation of objects in images , 1994 .

[119]  E. Mayoraz,et al.  Fusion of face and speech data for person identity verification , 1999, IEEE Trans. Neural Networks.

[120]  Terrence J. Sejnowski,et al.  SEXNET: A Neural Network Identifies Sex From Human Faces , 1990, NIPS.

[121]  Chung-Lin Huang,et al.  Facial Expression Recognition Using Model-Based Feature Extraction and Action Parameters Classification , 1997, J. Vis. Commun. Image Represent..

[122]  Simon Lucey An Evaluation of Visual Speech Features for the Tasks of Speech and Speaker Recognition , 2003, AVBPA.

[123]  Harry Wechsler,et al.  Eye Detection Using Optimal Wavelet Packets and Radial Basis Functions (RBFs) , 1999, Int. J. Pattern Recognit. Artif. Intell..

[124]  Kevin P. Murphy,et al.  A coupled HMM for audio-visual speech recognition , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[125]  Paul A. Viola,et al.  A cluster-based statistical model for object detection , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[126]  Stephen J. Cox,et al.  Audiovisual speech recognition using multiscale nonlinear image decomposition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[127]  Alejandro F. Frangi,et al.  Lip reading for robust speech recognition on embedded devices , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[128]  Tom Hintz,et al.  Kernel-based Subspace Analysis for Face Recognition , 2007, 2007 International Joint Conference on Neural Networks.

[129]  David C. Gibbon,et al.  Multi-modal system for locating heads and faces , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[130]  Timothy F. Cootes,et al.  Automatic tracking, coding and reconstruction of human faces, using flexible appearance models , 1994 .

[131]  Walid Mahdi,et al.  Colour and Geometric based Model for Lip Localisation: Application for Lip-reading System , 2007, 14th International Conference on Image Analysis and Processing (ICIAP 2007).

[132]  Samy Bengio,et al.  Multimodal Authentication Using Asynchronous HMMs , 2003, AVBPA.

[133]  Rolf Ingold,et al.  MYIDEA - MULTIMODAL BIOMETRICS DATABASE, DESCRIPTION OF ACQUISITION PROTOCOLS , 2005 .

[134]  R Kahavi,et al.  Wrapper for feature subset selection , 1997 .

[135]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[136]  Ming Li,et al.  An Experimental Study on Automatic Face Gender Classification , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[137]  Tsuhan Chen,et al.  Audiovisual speech processing , 2001, IEEE Signal Process. Mag..

[138]  Alexander H. Waibel,et al.  A real-time face tracker , 1996, Proceedings Third IEEE Workshop on Applications of Computer Vision. WACV'96.

[139]  B. Hamber Publications , 1998, Weed Technology.

[140]  Juergen Luettin,et al.  Speaker identification by lipreading , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[141]  S. Zahorian,et al.  Text‐independent talker identification using recurrent neural networks , 1990 .

[142]  Weimin Huang,et al.  A robust approach to face and eyes detection from images with cluttered background , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[143]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[144]  Shu Hung Leung,et al.  Lip image segmentation using fuzzy clustering incorporating an elliptic shape function , 2004, IEEE Transactions on Image Processing.

[145]  Hong Yan,et al.  Human Face Image Recognition: An Evidence Aggregation Approach , 1998, Comput. Vis. Image Underst..

[146]  Timothy F. Cootes,et al.  Active Appearance Models , 1998, ECCV.

[147]  Juergen Luettin,et al.  Acoustic-labial Speaker Verification , 1997, AVBPA.

[148]  Michael T. Chan,et al.  Automatic lip model extraction for constrained contour-based tracking , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[149]  Saad Ahmed Sirohey,et al.  Human Face Segmentation and Identification , 1998 .

[150]  Jitendra Ajmera,et al.  Robust audio segmentation , 2004 .

[151]  Christophe Garcia,et al.  Convolutional face finder: a neural architecture for fast and robust face detection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[152]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[153]  T. Sakai,et al.  Computer analysis and classification of photographs of human faces , 1973 .

[154]  Juergen Luettin,et al.  A comparison of model and transform-based visual features for audio-visual LVCSR , 2001, IEEE International Conference on Multimedia and Expo, 2001. ICME 2001..

[155]  Yoshinobu Tonomura,et al.  Video tomography: an efficient method for camerawork extraction and motion analysis , 1994, MULTIMEDIA '94.

[156]  Gérard Chollet,et al.  Neural net approaches to speaker verification: comparison with second order statistic measures , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[157]  Benoît Maison,et al.  Audio-visual speaker recognition for video broadcast news: some fusion techniques , 1999, 1999 IEEE Third Workshop on Multimedia Signal Processing (Cat. No.99TH8451).

[158]  Harry Wechsler,et al.  Detection of human faces using decision trees , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[159]  Luís A. Alexandre,et al.  On combining classifiers using sum and product rules , 2001, Pattern Recognit. Lett..

[160]  James L. Crowley,et al.  Multi-modal tracking of faces for video communications , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[161]  Takeo Kanade,et al.  A statistical method for 3D object detection applied to faces and cars , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[162]  Liyanage C. De Silva,et al.  Head gestures recognition , 2001, ICIP.

[163]  Joseph Picone,et al.  Signal modeling techniques in speech recognition , 1993, Proc. IEEE.

[164]  P. Jonathon Phillips,et al.  Facial Recognition Vendor Test 2000: Evaluation Report , 2001 .

[165]  Sadaoki Furui,et al.  An Overview of Speaker Recognition Technology , 1996 .

[166]  Kentaro Toyama,et al.  “Look, Ma – No Hands!” Hands-Free Cursor Control with Real-Time 3D Face Tracking , 1998 .

[167]  Johan Stephen Simeon Ballot Face recognition using Hidden Markov Models , 2005 .

[168]  Jianguo Zhang,et al.  Biometric Identification Using Motion History Images of a Speaker's Lip Movements , 2008, 2008 International Machine Vision and Image Processing Conference.

[169]  Josef Bigün,et al.  Synergy of Lip-Motion and Acoustic Features in Biometric Speech and Speaker Recognition , 2007, IEEE Transactions on Computers.

[170]  Takeo Kanade,et al.  Probabilistic modeling of local appearance and spatial relationships for object recognition , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[171]  P. Gallinari,et al.  A connectionist approach for automatic speaker identification , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[172]  Shinjiro Kawato,et al.  Real-time detection of nodding and head-shaking by directly detecting and tracking the "between-eyes" , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[173]  Timothy F. Cootes,et al.  Statistical models of appearance for computer vision , 1999 .

[174]  Til T. Phan,et al.  Text-Independent Speaker Identification , 1999 .

[175]  L. Rabiner,et al.  An algorithm for determining the endpoints of isolated utterances , 1974, The Bell System Technical Journal.

[176]  Stephen A. Zahorian,et al.  Text-independent talker identification with neural networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[177]  Naohisa Komatsu,et al.  Multimodal Biometrics of Lip Movements and Voice using Kernel Fisher Discriminant Analysis , 2006, 2006 9th International Conference on Control, Automation, Robotics and Vision.

[178]  J. Makhoul,et al.  Linear prediction: A tutorial review , 1975, Proceedings of the IEEE.

[179]  K. Sugiyama,et al.  Motion compensated frame rate conversion using normalized motion estimation , 2005, IEEE Workshop on Signal Processing Systems Design and Implementation, 2005..

[180]  Somnath Sengupta,et al.  Lip Localization and Viseme Recognition from Video Sequences , 2007 .

[181]  Michael I. Jordan,et al.  Mixed Memory Markov Models: Decomposing Complex Stochastic Processes as Mixtures of Simpler Ones , 1999, Machine Learning.

[182]  Juergen Luettin,et al.  Audio-Visual Automatic Speech Recognition: An Overview , 2004 .

[183]  Chin-Chuan Han,et al.  Facial feature detection using geometrical face model: An efficient approach , 1998, Pattern Recognit..

[184]  James Llinas,et al.  Multisensor Data Fusion , 1990 .

[185]  Christopher G. Harris,et al.  A Combined Corner and Edge Detector , 1988, Alvey Vision Conference.

[186]  Ming-Hsuan Yang,et al.  Gender classification with support vector machines , 2000, Proceedings Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580).

[187]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[188]  J. G. Wilpon,et al.  An improved word-detection algorithm for telephone-quality speech incorporating both syntactic and semantic constraints , 1984, AT&T Bell Laboratories Technical Journal.

[189]  Richard B. Reilly,et al.  Audio-Visual Speaker Identification Based on the Use of Dynamic Audio and Visual Features , 2003, AVBPA.

[190]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[191]  Gregory J. Wolff,et al.  Lipreading by Neural Networks: Visual Preprocessing, Learning, and Sensory Integration , 1993, NIPS.

[192]  Alice J. O'Toole,et al.  A video database of moving faces and people , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[193]  Alan Wee-Chung Liew,et al.  Segmentation of color lip images by spatial fuzzy clustering , 2003, IEEE Trans. Fuzzy Syst..

[194]  Josef Bigün,et al.  Motion Features from Lip Movement for Person Authentication , 2006, 18th International Conference on Pattern Recognition (ICPR'06).

[195]  Vlasta Radová,et al.  An approach to speaker identification using multiple classifiers , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[196]  Claude C. Chibelushi,et al.  Robust Facial Feature Tracking , 2000, BMVC.

[197]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[198]  Ying Dai,et al.  Face-texture model based on SGLD and its application in face detection in a color scene , 1996, Pattern Recognit..

[199]  Alan L. Yuille,et al.  Feature extraction from faces using deformable templates , 2004, International Journal of Computer Vision.

[200]  Franck Luthon,et al.  Unsupervised lip segmentation under natural conditions , 1999, 1999 IEEE International Conference on Acoustics, Speech, and Signal Processing. Proceedings. ICASSP99 (Cat. No.99CH36258).

[201]  R.W. Schafer,et al.  Digital representations of speech signals , 1975, Proceedings of the IEEE.

[202]  T. Poggio,et al.  Synthesizing a color algorithm from examples. , 1988, Science.

[203]  Joni-Kristian Kämäräinen,et al.  Feature representation and discrimination based on Gaussian mixture model probability densities - Practices and algorithms , 2006, Pattern Recognit..

[204]  Takeo Kanade,et al.  Rotation invariant neural network-based face detection , 1998, Proceedings. 1998 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No.98CB36231).

[205]  Shaogang Gong,et al.  Learning gender from human gaits and faces , 2007, 2007 IEEE Conference on Advanced Video and Signal Based Surveillance.

[206]  Ashok Samal,et al.  Artificial Neural Network architectures for human face detection , 1992 .

[207]  Richard J. Mammone,et al.  Speaker recognition using neural networks and conventional classifiers , 1994, IEEE Trans. Speech Audio Process..

[208]  Tomaso Poggio,et al.  Synthesizing a color algorithm from examples , 1988 .

[209]  Jean-Luc Dugelay,et al.  Tomofaces: Eigenfaces extended to videos of speakers , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[210]  Gerasimos Potamianos,et al.  An image transform approach for HMM based automatic lipreading , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[211]  Alice Caplier,et al.  Head nods analysis: interpretation of non verbal communication gestures , 2005, IEEE International Conference on Image Processing 2005.

[212]  Gérard Chollet,et al.  BIOMET: A Multimodal Person Authentication Database Including Face, Voice, Fingerprint, Hand and Signature Modalities , 2003, AVBPA.