Segmentation of lip pixels for lip tracker initialisation

We propose a novel image segmentation method for lip tracker initialisation which is based on a Gaussian mixture model of the pixel RGB values. The model is built using the predictive validation technique advocated by Kittler, Messer and Sadeghi (see Second International Conference on Advances in Pattern Recognition, Brazil, March 2001) which has been modified to allow modelling with full covariance matrices. A subsequent grouping of the mixture components provides the basis for a Bayesian rule labelling of the pixels as lip or non-lip. We test the proposed method on a database of 145 images and demonstrate that its accuracy is significantly better than the segmentation obtained by k-means clustering. Moreover, the proposed method does not require the number of segments to be specified a priori.

[1]  Josef Kittler,et al.  Model Validation for Model Selection , 2001, ICAPR.

[2]  Jiri Matas,et al.  XM2VTSDB: The Extended M2VTS Database , 1999 .

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  Josef Kittler,et al.  Model complexity validation for PDF estimation using Gaussian mixtures , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[5]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[6]  Juergen Luettin,et al.  Audio-Visual Speech Modeling for Continuous Speech Recognition , 2000, IEEE Trans. Multim..

[7]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[8]  Alexander H. Waibel,et al.  A real-time face tracker , 1996, Proceedings Third IEEE Workshop on Applications of Computer Vision. WACV'96.