Real-time lip-tracking for lipreading

This paper presents a new approach to lip tracking for lipreading. Instead of only tracking features on lips, we propose to track lips along with other facial features such as pupils and nostril. In the new approach, the face is rst located in an image using a stochastic skin-color model, the eyes, lip-corners and nostrils are then located and tracked inside the facial region. The new approach can e ectively improve the robustness of lip-tracking and simplify automatic detection and recovery of tracking failure. The feasibility of the proposed approach has been demonstrated by implementation of a lip tracking system. The system has been tested by a database that contains 900 image sequences of di erent speakers spelling words. The system has successfully extract lip regions from the image sequences to obtain training data for the audio-visual speech recognition system. The system has been also applied to extract the lip region in real-time from live video images to obtain the visual input for an audio-visual speech recognition system. On test sequences we have achieved a reduction of the number of frames with tracking failures by a factor of two using detection and prediction of outliers in the set of found features.

[1]  Alexander H. Waibel,et al.  Toward movement-invariant automatic lip-reading and speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[2]  Gregory J. Wolff,et al.  Neural network lipreading system for improved speech recognition , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[3]  Alexander H. Waibel,et al.  See Me, Hear Me: Integrating Automatic Speech Recognition and Lip-reading , 1994 .

[4]  Paul Duchnowski,et al.  Adaptive bimodal sensor fusion for automatic speechreading , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  Alex Waibel,et al.  A model-based gaze tracking system , 1996, Proceedings IEEE International Joint Symposia on Intelligence and Systems.

[6]  Alexander H. Waibel,et al.  A real-time face tracker , 1996, Proceedings Third IEEE Workshop on Applications of Computer Vision. WACV'96.

[7]  Eric David Petajan,et al.  Automatic Lipreading to Enhance Speech Recognition (Speech Reading) , 1984 .

[8]  Roberto Cipolla,et al.  Fast visual tracking by temporal consensus , 1996, Image Vis. Comput..

[9]  Alex Pentland,et al.  Automatic lipreading by optical-flow analysis , 1989 .

[10]  Rainer Stiefelhagen,et al.  Preprocessing of visual speech under real world conditions , 1997, AVSP.

[11]  Takeo Kanade,et al.  Picture Processing System by Computer Complex and Recognition of Human Faces , 1974 .