Lip Localization and Viseme Recognition from Video Sequences

Viseme (visual cue) recognition is one of the steps to be followed in building an automated lip-reading system. In order to recognize a viseme, one has to first detect the lips of the speaker from the video sequences and track them to extract the feature vectors for the final recognition. A novel method for liplocalization based on the color models has been proposed. Also, the basic possible lip-shapes depicting the visual-cues have been presented along with their mapping to the corresponding phonemes. In the next level, mapping of the feature vectors from the lip-localization algorithm to the visual cues has been performed.

[1]  Alice Caplier,et al.  Key points based segmentation of lips , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[2]  Alice Caplier,et al.  New color transformation for lips segmentation , 2001, 2001 IEEE Fourth Workshop on Multimedia Signal Processing (Cat. No.01TH8564).

[3]  Alice Caplier,et al.  Jumping snakes and parametric model for lip segmentation , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[4]  Wen-Nung Lie,et al.  Lips detection by morphological image processing , 1998, ICSP '98. 1998 Fourth International Conference on Signal Processing (Cat. No.98TH8344).

[5]  Stephen J. Cox,et al.  Lip reading from scale-space measurements , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[6]  Martin Fodslette Meiller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1993 .

[7]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[8]  M. Lie UNSUPERVISED LIP SEGMENTATION UNDER NATURAL CONDITIONS , 1999 .

[9]  Jing Xiao,et al.  Automatic selection of visemes for image-based visual speech synthesis , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[10]  Patrick Lucey,et al.  Confusability of Phonemes Grouped According to their Viseme Classes in Noisy Environments , 2004 .

[11]  Alice Caplier,et al.  Accurate and quasi-automatic lip tracking , 2004, IEEE Transactions on Circuits and Systems for Video Technology.