Speechreading Using Probabilistic Models Speechreading Using Probabilistic Models

A robust method for locating and tracking lips in gray level image sequences is described The method learns patterns of shape variability from a training set which constrains the model during image search to only deform in ways similar to the training examples Image search is guided by a learned gray level model which is used to describe the large appearance variability of lips Such variability might be due to di erent individuals illumination mouth opening specularity or visibility of teeth and tongue Visual speech features are recovered from the tracking results and represent both shape and intensity information We describe a speechreading lip reading system where the extracted features are modeled by Gaussian distributions and their temporal dependencies by Hidden Markov Models Experimental results are presented for locating lips tracking lips and speechreading The database used consists of a broad variety of speakers and was recorded in a natural environment with no special lighting or lip markers used For a speaker independent digit recognition task using visual information only the system achieved an accuracy about equivalent to that of untrained humans

[1]  John A. Nelder,et al.  A Simplex Method for Function Minimization , 1965, Comput. J..

[2]  Andrew J. Viterbi,et al.  Error bounds for convolutional codes and an asymptotically optimum decoding algorithm , 1967, IEEE Trans. Inf. Theory.

[3]  L. Baum,et al.  A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[4]  Keinosuke Fukunaga,et al.  Introduction to Statistical Pattern Recognition , 1972 .

[5]  Arthur Gelb,et al.  Applied Optimal Estimation , 1974 .

[6]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[7]  A. Montgomery,et al.  Physical characteristics of the lips underlying vowel lipreading performance. , 1983, The Journal of the Acoustical Society of America.

[8]  Eric David Petajan,et al.  Automatic Lipreading to Enhance Speech Recognition (Speech Reading) , 1984 .

[9]  D. Reisberg,et al.  Easy to hear but hard to understand: A lip-reading advantage with intact auditory stimuli. , 1987 .

[10]  Louis D. Braida,et al.  Evaluating the articulation index for auditory-visual input. , 1987, The Journal of the Acoustical Society of America.

[11]  David Taylor Hearing by Eye: The Psychology of Lip-Reading , 1988 .

[12]  Allen A. Montgomery,et al.  Automatic optically-based recognition of speech , 1988, Pattern Recognit. Lett..

[13]  N. Isshiki Physiology of Speech Production , 1989 .

[14]  B.P. Yuhas,et al.  Integration of acoustic and visual speech signals using neural networks , 1989, IEEE Communications Magazine.

[15]  Kenji Kurosu,et al.  Neural network vowel-recognition jointly using voice features and mouth shape image , 1991, Pattern Recognit..

[16]  Gregory J. Wolff,et al.  Neural network lipreading system for improved speech recognition , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[17]  Q. Summerfield,et al.  Lipreading and audio-visual speech perception. , 1992, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[18]  C. Taylor,et al.  Active shape models - 'Smart Snakes'. , 1992 .

[19]  Chung-Lin Huang,et al.  Human facial feature extraction for face interpretation and recognition , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol.II. Conference B: Pattern Recognition Methodology and Systems.

[20]  Roberto Brunelli,et al.  Face Recognition: Features Versus Templates , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Gregory J. Wolff,et al.  Lipreading by Neural Networks: Visual Preprocessing, Learning, and Sensory Integration , 1993, NIPS.

[22]  Alan Jeffrey Goldschen,et al.  Continuous automatic speech recognition by lipreading , 1993 .

[23]  Stephen M. Omohundro,et al.  Surface Learning with Applications to Lipreading , 1993, NIPS.

[24]  Alexander H. Waibel,et al.  Improving connected letter recognition by lipreading , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[25]  Timothy F. Cootes,et al.  Building and using flexible models using grey-level information , 1993, ICCV 1993.

[26]  Xiaobo Li,et al.  Towards a system for automatic facial feature detection , 1993, Pattern Recognit..

[27]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[28]  Timothy F. Cootes,et al.  Use of active shape models for locating structures in medical images , 1994, Image Vis. Comput..

[29]  Yochai Konig,et al.  "Eigenlips" for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[30]  Timothy F. Cootes,et al.  A Probabilistic Fitness Measure for Deformable Template Models , 1994, BMVC.

[31]  Timothy F. Cootes,et al.  An Automatic Face Identification System Using Flexible Appearance Models , 1994, BMVC.

[32]  Alexander H. Waibel,et al.  See Me, Hear Me: Integrating Automatic Speech Recognition and Lip-reading , 1994 .

[33]  Javier R. Movellan,et al.  Visual Speech Recognition with Stochastic Networks , 1994, NIPS.

[34]  Alex Pentland,et al.  View-based and modular eigenspaces for face recognition , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.

[35]  David G. Stork,et al.  Using deformable templates to infer visual speech dynamics , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[36]  Alex Pentland,et al.  Facial expression recognition using a dynamic model and motion energy , 1995, Proceedings of IEEE International Conference on Computer Vision.

[37]  Timothy F. Cootes,et al.  Automatic face identification system using flexible appearance models , 1995, Image Vis. Comput..

[38]  Yifan Gong,et al.  Speech recognition in noisy environments: A survey , 1995, Speech Commun..

[39]  Timothy F. Cootes,et al.  A unified approach to coding and interpreting face images , 1995, Proceedings of IEEE International Conference on Computer Vision.

[40]  Timothy F. Cootes,et al.  Active Shape Models-Their Training and Application , 1995, Comput. Vis. Image Underst..

[41]  Lorenzo Torresani,et al.  2D Deformable Models for Visual Speech Analysis , 1996 .

[42]  Barney Dalton,et al.  Automatic Speechreading using dynamic contours , 1996 .

[43]  D. Reisfeld,et al.  Face recognition using a hybrid supervised/unsupervised neural network , 1996, Proceedings of the 12th IAPR International Conference on Pattern Recognition, Vol. 3 - Conference C: Signal Processing (Cat. No.94CH3440-5).

[44]  Juergen Luettin,et al.  Speaker identification by lipreading , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[45]  Michael Vogt Fast Matching of a Dynamic Lip Model to Color Video Sequences under Regular Illumination Conditions , 1996 .

[46]  Andrew Blake,et al.  Real-Time Lip Tracking for Audio-Visual Speech Recognition Applications , 1996, ECCV.

[47]  Juergen Luettin,et al.  Speechreading using shape and intensity information , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[48]  Juergen Luettin,et al.  Visual speech recognition using active shape models and hidden Markov models , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[49]  Alan C. Bovik,et al.  Computer lipreading for improved accuracy in automatic speech recognition , 1996, IEEE Trans. Speech Audio Process..

[50]  Christian Benoît,et al.  Which components of the face do humans and machines best speechread , 1996 .

[51]  Juergen Luettin,et al.  Statistical LIP modelling for visual speech recognition , 1996, 1996 8th European Signal Processing Conference (EUSIPCO 1996).

[52]  Greg I. Chiou Active contour models for distinct feature tracking and lipreading , 1996 .

[53]  Juergen Luettin,et al.  Active Shape Models for Visual Speech Feature Extraction , 1996 .

[54]  Juergen Luettin,et al.  Learning to recognise talking faces , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[55]  David G. Stork,et al.  Machine Recognition and Applications , 1996 .