Statistical chromaticity-based lip tracking with B-splines

We present a statistical, colour-based technique for lip tracking intended to support personal verification. The lips are automatically localised in the original image by exploiting grey-level gradient projections as well as chromaticity models to find the mouth area in an automatically segmented region corresponding to the face area. A B-spline, initially with an elliptic shape is then generated to start up tracking. Tracking proceeds by estimating new lip contour positions according to a statistical chromaticity model for the lips. These measurements are used together with a Lagrangian formulation of contour dynamics to update the new spline control points. The method has been tested on the M2VTS database, where lips were accurately tracked on sequences of speaking subjects consisting of more than hundred frames. The tracker can be used to perform feature extraction from the mouth area as well as for model detection for personal verification applications.

[1]  Jiri Matas,et al.  Colour-based object recognition , 1995 .

[2]  Alexander H. Waibel,et al.  Toward movement-invariant automatic lip-reading and speech recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Michael Isard,et al.  Learning to Track the Visual Motion of Contours , 1995, Artif. Intell..

[4]  Karin Sobottka Ioannis Pitas Localization Of Facial Regions And Features In Color Images , 1996 .

[5]  Roberto Brunelli,et al.  Face Recognition: Features Versus Templates , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[6]  Andrew Blake,et al.  Real-Time Lip Tracking for Audio-Visual Speech Recognition Applications , 1996, ECCV.

[7]  Yochai Konig,et al.  "Eigenlips" for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[8]  Fujio Yamaguchi,et al.  Curves and Surfaces in Computer Aided Geometric Design , 1988, Springer Berlin Heidelberg.

[9]  Michael Isard,et al.  3D position, attitude and shape input using video tracking of hands and lips , 1994, SIGGRAPH.

[10]  Andrew Blake,et al.  Dynamic contours: real-time active splines , 1993 .

[11]  A. Zakhor,et al.  Depth based recovery of human facial features from video sequences , 1995, Proceedings., International Conference on Image Processing.

[12]  Juergen Luettin,et al.  Active Shape Models for Visual Speech Feature Extraction , 1996 .

[13]  Gerald Farin,et al.  Curves and surfaces for computer aided geometric design , 1990 .

[14]  Andrew Blake,et al.  Determining facial expressions in real time , 1995, Proceedings of IEEE International Conference on Computer Vision.