Speechreading: an overview of image processing, feature extraction, sensory integration and pattern recognition techniques

We give an overview of speechreading systems from the perspective of the face and gesture recognition community, paying particular attention to approaches to key design decisions and the benefits and drawbacks. We discuss the central issue of sensory integration how much processing of the acoustic and the visual information should go on before integration how should it be integrated. We describe several possible practical applications, and conclude with a list of important outstanding problems that seem amenable to attack using techniques developed in the face and gesture recognition community.

[1]  Eric David Petajan,et al.  Automatic Lipreading to Enhance Speech Recognition (Speech Reading) , 1984 .

[2]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[3]  Terrence J. Sejnowski,et al.  Neural network models of sensory integration for improved vowel recognition , 1990, Proc. IEEE.

[4]  Biing-Hwang Juang,et al.  Hidden Markov Models for Speech Recognition , 1991 .

[5]  Gregory J. Wolff,et al.  Neural network lipreading system for improved speech recognition , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[6]  Gregory J. Wolff,et al.  Lipreading by Neural Networks: Visual Preprocessing, Learning, and Sensory Integration , 1993, NIPS.

[7]  Alan Jeffrey Goldschen,et al.  Continuous automatic speech recognition by lipreading , 1993 .

[8]  Stephen M. Omohundro,et al.  Surface Learning with Applications to Lipreading , 1993, NIPS.

[9]  LEARNING VISUAL SPEECH , 1993 .

[10]  Alexander H. Waibel,et al.  See Me, Hear Me: Integrating Automatic Speech Recognition and Lip-reading , 1994 .

[11]  David G. Stork,et al.  Using deformable templates to infer visual speech dynamics , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[12]  Alex Waibel,et al.  Face locating and tracking for human-computer interaction , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[13]  Philip R. Thrift,et al.  Surfing the Web by voice , 1995, MULTIMEDIA '95.

[14]  M.E. Hennecke,et al.  Automatic speech recognition system using acoustic and visual signals , 1995, Conference Record of The Twenty-Ninth Asilomar Conference on Signals, Systems and Computers.

[15]  Lorenzo Torresani,et al.  2D Deformable Models for Visual Speech Analysis , 1996 .

[16]  Barney Dalton,et al.  Automatic Speechreading using dynamic contours , 1996 .

[17]  Peter L. Silsbee,et al.  Audiovisual Sensory Integration Using Hidden Markov Models , 1996 .

[18]  A. Adjoudani,et al.  On the Integration of Auditory and Visual Parameters in an HMM-based ASR , 1996 .

[19]  Yochai Konig,et al.  Towards a Robust Speechreading Dialog System , 1996 .

[20]  Michael Vogt Fast Matching of a Dynamic Lip Model to Color Video Sequences under Regular Illumination Conditions , 1996 .

[21]  Hans Peter Graf,et al.  Robust face feature analysis for automatic speechreading and character animation , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[22]  Piero Cosi,et al.  Lips and Jaw Movements for Vowels and Consonants: Spatio-Temporal Characteristics and Bimodal Recognition Applications , 1996 .

[23]  Alan C. Bovik,et al.  Computer lipreading for improved accuracy in automatic speech recognition , 1996, IEEE Trans. Speech Audio Process..

[24]  F. Lavagetto,et al.  Time Delay Neural Networks for Articulatory Estimation from Speech: Suitable Subjective Evaluation Protocols , 1996 .

[25]  Juergen Luettin,et al.  Active Shape Models for Visual Speech Feature Extraction , 1996 .

[26]  David G. Stork,et al.  Visionary Speech: Looking Ahead to Practical Speechreading Systems , 1996 .