Automatic lip model extraction for constrained contour-based tracking

Contour model-based tracking is more robust if an accurate reference shape model of the underlying object is available. Since lip shapes vary, the ability to automatically extract user-dependent lip models from input images is desirable. We present an unsupervised segmentation method to hierarchically locate the user's face and then the lips. Techniques employed include modeling in the hue/saturation color space using Gaussian mixture models and the use of geometric constraints. With the region of interest automatically located, the model extraction problem is then formulated as a regularized model-fitting problem. The use of a generic shape as prior information improves the accuracy of the extracted lip model which is based an a cubic B-spline representation. We also describe a method to compute automatically an optimal linear color space transform needed to obtain raw estimates of the lip boundary locations, as required by the fitting procedure.

[1]  E. Petajan,et al.  An improved automatic lipreading system to enhance speech recognition , 1988, CHI '88.

[2]  G. McLachlan On Bootstrapping the Likelihood Ratio Test Statistic for the Number of Components in a Normal Mixture , 1987 .

[3]  Thomas S. Huang,et al.  Human face detection in a complex background , 1994, Pattern Recognit..

[4]  Alan Jeffrey Goldschen,et al.  Continuous automatic speech recognition by lipreading , 1993 .

[5]  Thomas S. Huang,et al.  Real-time lip tracking and bimodal continuous speech recognition , 1998, 1998 IEEE Second Workshop on Multimedia Signal Processing (Cat. No.98EX175).

[6]  Franck Luthon,et al.  Lip features automatic extraction , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[7]  Andrew Blake,et al.  Accurate, real-time, unadorned lip tracking , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[8]  Clement T. Yu,et al.  Detecting human faces in color images , 1998, Proceedings International Workshop on Multi-Media Database Management Systems (Cat. No.98TB100249).

[9]  David G. Stork,et al.  Visionary Speech: Looking Ahead to Practical Speechreading Systems , 1996 .

[10]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[11]  Yochai Konig,et al.  "Eigenlips" for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Dimitris N. Metaxas,et al.  Dynamic 3D Models with Local and Global Deformations: Deformable Superquadrics , 1991, IEEE Trans. Pattern Anal. Mach. Intell..