A 3D Head Tracker for an Automatic Lipreading System

A real world automatic lip reading system must be able to cope with movement of the speaker’s head during operation. The observed mouth shape depends not only on the true shape of the mouth, but also the angle at which the mouth is viewed. As the speaker’s head moves and rotates the viewing angle changes. The resulting distortion can lead to inaccurate mouth measurement and incorrect phoneme recognition. We have developed a system that robustly measures the dimensions of a speaker’s mouth whilst the speaker’s head is moving and exhibiting rotations of up to 30 degrees away from the camera. Our system tracks the pose of the speaker’s head in 3D, detects the mouth by tracking unadorned lip contours and estimates the 3D locations of the upper and lower lip edges and the mouth corners. The system is demonstrated on a person speaking whilst moving his head in 3D, and the mouth height and width are corrected over 9 seconds of 25Hz video footage.

[1]  David G. Lowe,et al.  Fitting Parameterized Three-Dimensional Models to Images , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Satoshi Nakamura,et al.  Lip movement synthesis from speech based on hidden Markov models , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[3]  Stephen J. Cox,et al.  Lip reading from scale-space measurements , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  Jenq-Neng Hwang,et al.  Image sequence classification using a neural network based active contour model and a hidden Markov model , 1994, Proceedings of 1st International Conference on Image Processing.

[5]  Stephen M. Omohundro,et al.  Nonlinear manifold learning for visual speech recognition , 1995, Proceedings of IEEE International Conference on Computer Vision.