Robust lip region segmentation for lip images with complex background

Robust and accurate lip region segmentation is of vital importance for lip image analysis. However, most of the current techniques break down in the presence of mustaches and beards. With mustaches and beards, the background region becomes complex and inhomogeneous. We propose in this paper a novel multi-class, shape-guided FCM (MS-FCM) clustering algorithm to solve this problem. For this new approach, one cluster is set for the object, i.e. the lip region, and a combination of multiple clusters for the background which generally includes the skin region, lip shadow or beards. The proper number of background clusters is derived automatically which maximizes a cluster validity index. A spatial penalty term considering the spatial location information is introduced and incorporated into the objective function such that pixels having similar color but located in different regions can be differentiated. This facilitates the separation of lip and background pixels that otherwise are inseparable due to the similarity in color. Experimental results show that the proposed algorithm provides accurate lip-background partition even for the images with complex background features like mustaches and beards.

[1]  James C. Bezdek,et al.  A Convergence Theorem for the Fuzzy ISODATA Clustering Algorithms , 1980, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Thomas A. Runkler,et al.  Image segmentation using fuzzy clustering with fractal features , 1997, Proceedings of 6th International Fuzzy Systems Conference.

[3]  Kuntal Sengupta,et al.  Audio-visual modeling for bimodal speech recognition , 2001, 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236).

[4]  Y. A. Tolias,et al.  On applying spatial constraints in fuzzy image clustering using a fuzzy rule-based system , 1998, IEEE Signal Processing Letters.

[5]  Sridha Sridharan,et al.  An approach to statistical lip modelling for speaker identification via chromatic feature extraction , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[6]  Russell M. Mersereau,et al.  Lip feature extraction towards an automatic speechreading system , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[7]  Gaurav Sharma,et al.  Color imaging for multimedia , 1998, Proc. IEEE.

[8]  Ujjwal Maulik,et al.  Performance Evaluation of Some Clustering Algorithms and Validity Indices , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Alan Wee-Chung Liew,et al.  Lip contour extraction from color images using a deformable model , 2002, Pattern Recognit..

[10]  K. Prasad,et al.  Using Deformable Templates to Infer Visual Speech , 1994 .

[11]  Alexander H. Waibel,et al.  Improving connected letter recognition by lipreading , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Sanguklee,et al.  A comparative performance study of several global thresholding techniques for segmentation , 1990 .

[13]  Aleix M. Martinez,et al.  The AR face database , 1998 .

[14]  David G. Stork,et al.  Using deformable templates to infer visual speech dynamics , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[15]  Alice Caplier Lip detection and tracking , 2001, Proceedings 11th International Conference on Image Analysis and Processing.

[16]  Stephen E. Levinson,et al.  Speaker independent audio-visual speech recognition , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[17]  Gihad Rabi,et al.  Visual speech recognition by recurrent neural networks , 1997, CCECE '97. Canadian Conference on Electrical and Computer Engineering. Engineering Innovation: Voyage of Discovery. Conference Proceedings.

[18]  Alice Caplier,et al.  New color transformation for lips segmentation , 2001, 2001 IEEE Fourth Workshop on Multimedia Signal Processing (Cat. No.01TH8564).

[19]  Shu Hung Leung,et al.  Lip image segmentation using fuzzy clustering incorporating an elliptic shape function , 2004, IEEE Transactions on Image Processing.

[20]  Eric David Petajan,et al.  Automatic Lipreading to Enhance Speech Recognition (Speech Reading) , 1984 .

[21]  Michael T. Chan,et al.  HMM-based audio-visual speech recognition integrating geometric- and appearance-based visual features , 2001, 2001 IEEE Fourth Workshop on Multimedia Signal Processing (Cat. No.01TH8564).

[22]  Alan Wee-Chung Liew,et al.  Segmentation of color lip images by spatial fuzzy clustering , 2003, IEEE Trans. Fuzzy Syst..

[23]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[24]  Franck Luthon,et al.  Lip features automatic extraction , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[25]  N. P. Erber Interaction of audition and vision in the recognition of oral speech stimuli. , 1969, Journal of speech and hearing research.