Automatic Lipreading Research: Historic Overview and Current Work

Acoustic automatic speech recognition (ASR) systems tend to perform poorly with noisy speech. Unfortunately, most application environments contain noise from machines, vehicles, others talking, typing, television, sound systems, etc. In addition, system performance is highly dependent on the particular microphone type and its placement, but most people find head-mounted microphones uncomfortable for extended use and they are impractical in many situations. Fortunately, the use of visual speech (lipreading or, more properly, speechreading) information has been shown to improve the performance of acoustic ASR systems especially in noise. This paper outlines the history of automatic lipreading research and describes the authors current efforts.

[1]  Terrence J. Sejnowski,et al.  Neural network models of sensory integration for improved vowel recognition , 1990, Proc. IEEE.

[2]  Lorenzo Torresani,et al.  2D Deformable Models for Visual Speech Analysis , 1996 .

[3]  Alexander H. Waibel,et al.  Improving connected letter recognition by lipreading , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Frederic I. Parke A model for human faces that allows speech synchronized animation , 1975, Comput. Graph..

[5]  David G. Stork,et al.  Using deformable templates to infer visual speech dynamics , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[6]  Stephen M. Omohundro,et al.  Surface Learning with Applications to Lipreading , 1993, NIPS.

[7]  Peter L. Silsbee Motion in deformable templates , 1994, Proceedings of 1st International Conference on Image Processing.

[8]  Alex Waibel,et al.  Bimodal sensor integration on the example of 'speechreading' , 1993, IEEE International Conference on Neural Networks.

[9]  S. Nishida Speech recognition enhancement by lip information , 1986, CHI '86.

[10]  Barney Dalton,et al.  Automatic Speechreading using dynamic contours , 1996 .

[11]  E. Petajan,et al.  An improved automatic lipreading system to enhance speech recognition , 1988, CHI '88.

[12]  Thomas S. Huang,et al.  Image processing , 1971 .

[13]  Alan Jeffrey Goldschen,et al.  Continuous automatic speech recognition by lipreading , 1993 .

[14]  Eric David Petajan,et al.  Automatic Lipreading to Enhance Speech Recognition (Speech Reading) , 1984 .

[15]  Yochai Konig,et al.  A hybrid approach to bimodal speech recognition , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[16]  Oscar N. Garcia,et al.  Continuous optical automatic speech recognition by lipreading , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[17]  Frederic I. Parke,et al.  A parametric model for human faces. , 1974 .

[18]  P. L. Silsbee Sensory integration in audiovisual automatic speech recognition , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[19]  Parke,et al.  Parameterized Models for Facial Animation , 1982, IEEE Computer Graphics and Applications.

[20]  Ali Adjoudani,et al.  Audio-visual speech recognition compared across two architectures , 1995, EUROSPEECH.

[21]  Gregory J. Wolff,et al.  Neural network lipreading system for improved speech recognition , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[22]  Juergen Luettin,et al.  Active Shape Models for Visual Speech Feature Extraction , 1996 .

[23]  R. M. Mersereau,et al.  Lip modeling for visual speech recognition , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[24]  Alan C. Bovik,et al.  Computer lipreading for improved accuracy in automatic speech recognition , 1996, IEEE Trans. Speech Audio Process..

[25]  Kenji Kurosu,et al.  Neural network vowel-recognition jointly using voice features and mouth shape image , 1991, Pattern Recognit..

[26]  Jirí Benes,et al.  On neural networks , 1990, Kybernetika.

[27]  Yochai Konig,et al.  "Eigenlips" for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  Alan C. Bovik,et al.  Medium Vocabulary Audiovisual Speech Recognition , 1995 .

[29]  Alan C. Bovik,et al.  Audio-visual speech recognition for a vowel discrimination task , 1993, Other Conferences.