Unsupervised lip segmentation under natural conditions

An unsupervised algorithm for speaker's lip segmentation is presented. A color video sequence of the speaker's face is acquired, under natural lighting conditions and without any particular make-up. First, a logarithmic color transform is performed from the RGB to HI (hue, intensity) color space and sequence dependant parameters are evaluated. Second, a statistical approach using Markov random field modeling segment the mouth shape using the red hue predominant region and motion in a spatiotemporal neighborhood. Simultaneously, a region of interest (ROI) is automatically extracted. Third, the speaker's lip shape is extracted from the final hue field with good quality results in this challenging situation.

[1]  Jean-Charles Pinoli,et al.  Image dynamic range enhancement and stabilization in the context of the logarithmic image processing model , 1995, Signal Process..

[2]  Franck Luthon,et al.  Lip features automatic extraction , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[3]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  C. Benoît,et al.  A set of French visemes for visual speech synthesis , 1994 .

[5]  David G. Stork,et al.  Speechreading by Humans and Machines , 1996 .

[6]  S. Sridharan,et al.  A syntactic approach to automatic lip feature extraction for speaker identification , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).