Lipreading Using Fourier Transform over Time

This paper describes a novel approach to visual speech recognition. The intensity of each pixel in an image sequence is considered as a function of time. One-dimensional Fourier transform is applied to this intensity-versus-time function to model the lip movements. We present experimental results performed on two databases of ten English digits and letters, respectively.

[1]  Oscar N. Garcia,et al.  Continuous optical automatic speech recognition by lipreading , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[2]  Nicholas Ayache,et al.  Time Representation of Deformations: Combining Vibration Modes and Fourier Analysis , 1994, Object Representation in Computer Vision.

[3]  David G. Stork,et al.  Speechreading by Humans and Machines , 1996 .

[4]  Alex Waibel,et al.  Bimodal sensor integration on the example of 'speechreading' , 1993, IEEE International Conference on Neural Networks.

[5]  Michael Kirby,et al.  A model problem in the representation of digital image sequences , 1993, Pattern Recognit..

[6]  David G. Stork,et al.  Visionary Speech: Looking Ahead to Practical Speechreading Systems , 1996 .

[7]  Paul Duchnowski,et al.  Adaptive bimodal sensor fusion for automatic speechreading , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[8]  Juergen Luettin,et al.  Visual speech recognition using active shape models and hidden Markov models , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9]  Horst Bunke,et al.  Lipreading: A classifier combination approach , 1997, Pattern Recognit. Lett..

[10]  Javier R. Movellan,et al.  Visual Speech Recognition with Stochastic Networks , 1994, NIPS.