论文信息 - A Study on Lip Localization Techniques used for Lip reading from a Video

A Study on Lip Localization Techniques used for Lip reading from a Video

In this paper some of the different techniques used to localize the lips from the face are discussed and compared along with its processing steps. Lip localization is the basic step needed to read the lips for extracting visual information from the video input. The techniques could be applied on asymmetric lips and also on the mouth with visible teeth, tongue & mouth with moustache. In the process of Lip reading the following steps are generally used. They are, initially locating lips in the first frame of the video input, then tracking the lips in the following frames using the resulting pixel points of initial step and at last converting the tracked lip model to its corresponding matched letter to give the visual information. A new proposal is also initiated from the discussed techniques. The lip reading is useful in Automatic Speech Recognition when the audio is absent or present low with or without noise in the communication systems. Human Computer communication also will require speech recognition.

K. K. Thyagharajan | S. D. Lalitha

[1] Jeffrey F. Cohn,et al. Robust Lip Tracking by Combining Shape, Color and Motion , 2007 .

[2] Alice Caplier,et al. Jumping snakes and parametric model for lip segmentation , 2003, Proceedings 2003 International Conference on Image Processing (Cat. No.03CH37429).

[3] Yasuyuki Nakata,et al. Lipreading method using color extraction method and eigenspace technique , 2004, Systems and Computers in Japan.

[4] Sunil S. Morade,et al. Automatic Lip Tracking and Extraction of Lip Geometric Features for Lip Reading , 2013 .

[5] Alan Wee-Chung Liew,et al. Lip contour extraction from color images using a deformable model , 2002, Pattern Recognit..

[6] T. Poggio,et al. Synthesizing a color algorithm from examples. , 1988, Science.

[7] K. K. Thyagharajan,et al. Human action recognition using accumulated motion and gradient of motion from video , 2012, 2012 Third International Conference on Computing, Communication and Networking Technologies (ICCCNT'12).

[8] Walid Mahdi,et al. Lip Localization and Viseme Classification for Visual Speech Recognition , 2013, ArXiv.

[9] Tomaso Poggio,et al. Synthesizing a color algorithm from examples , 1988 .

[10] Gregory J. Wolff,et al. Preprocessing video images for neural learning of lipreading , 1994, Other Conferences.

[11] Russell M. Mersereau,et al. On merging hidden Markov models with deformable templates , 1995, Proceedings., International Conference on Image Processing.

[12] Jacob Scharcanski,et al. Audiovisual Voice Activity Detection Based on Microphone Arrays and Color Information , 2013, IEEE Journal of Selected Topics in Signal Processing.

[13] Walid Mahdi,et al. Automatic Hybrid Approach for Lip POI Localization : Application for Lip-reading System , 2007 .

[14] Alexander H. Waibel,et al. Towards Unrestricted Lip Reading , 2000, Int. J. Pattern Recognit. Artif. Intell..

[15] Alice Caplier,et al. New color transformation for lips segmentation , 2001, 2001 IEEE Fourth Workshop on Multimedia Signal Processing (Cat. No.01TH8564).

[16] W. Marsden. I and J , 2012 .

[17] Patrice Delmas,et al. Towards robust lip tracking , 2002, Object recognition supported by user interaction for service robots.

[18] Lorenzo Torresani,et al. 2D Deformable Models for Visual Speech Analysis , 1996 .

[19] Alice Caplier,et al. Accurate and quasi-automatic lip tracking , 2004, IEEE Transactions on Circuits and Systems for Video Technology.

[20] David G. Stork,et al. Using deformable templates to infer visual speech dynamics , 1994, Proceedings of 1994 28th Asilomar Conference on Signals, Systems and Computers.

[21] Demetri Terzopoulos,et al. Snakes: Active contour models , 2004, International Journal of Computer Vision.

[22] Aaas News,et al. Book Reviews , 1893, Buffalo Medical and Surgical Journal.

[23] Alice Caplier,et al. Key points based segmentation of lips , 2002, Proceedings. IEEE International Conference on Multimedia and Expo.

[24] Sadaoki Furui,et al. Audio-Visual Speech Recognition Using Lip Information Extracted from Side-Face Images , 2007, EURASIP J. Audio Speech Music. Process..

[25] Sridha Sridharan,et al. An extended pose-invariant lipreading system , 2007, AVSP.

[26] Takeo Kanade,et al. An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[27] Stephen J. Cox,et al. Audiovisual speech recognition using multiscale nonlinear image decomposition , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.