Human-Computer Interface Based on Visual Lip Movement and Gesture Recognition

The multimodal human-computer interface (HCI) called LipMouse is presented, allowing a user to work on a computer using movements and gestures made with his/her mouth only. Algorithms for lip movement tracking and lip gesture recognition are presented in details. User face images are captured with a standard webcam. Face detection is based on a cascade of boosted classifiers using Haar-like features. A mouth region is located in the lower part of the face region. Its position is used to track lip movements that allows a user to control a screen cursor. Three lip gestures are recognized: mouth opening, sticking out the tongue and forming puckered lips. Lip gesture recognition is performed by an artificial neural network and utilizes various image features of the lip region. An accurate lip shape is obtained by the means of lip image segmentation using fuzzy clustering.

[1]  Martin A. Riedmiller,et al.  A direct adaptive method for faster backpropagation learning: the RPROP algorithm , 1993, IEEE International Conference on Neural Networks.

[2]  Paul A. Viola,et al.  Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[3]  L.E.L. Moran,et al.  Automatic Extraction Of The Lips Shape Via Statistical Lips Modelling and Chromatic Feature , 2007, Electronics, Robotics and Automotive Mechanics Conference (CERMA 2007).

[4]  Jake K. Aggarwal,et al.  Human Motion Analysis: A Review , 1999, Comput. Vis. Image Underst..

[5]  Alice Caplier,et al.  New color transformation for lips segmentation , 2001, 2001 IEEE Fourth Workshop on Multimedia Signal Processing (Cat. No.01TH8564).

[6]  Shu Hung Leung,et al.  Lip image segmentation using fuzzy clustering incorporating an elliptic shape function , 2004, IEEE Transactions on Image Processing.

[7]  David A Clausi An analysis of co-occurrence texture statistics as a function of grey level quantization , 2002 .

[8]  Yannis Avrithis,et al.  Efficient face detection for multimedia applications , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[9]  Andrzej Czyzewski,et al.  Lip movement and gesture recognition for a multimodal human-computer interface , 2009, 2009 International Multiconference on Computer Science and Information Technology.

[10]  Russell M. Mersereau,et al.  Lip feature extraction towards an automatic speechreading system , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[11]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[12]  Narciso García,et al.  Fast face segmentation in component color space , 2004, 2004 International Conference on Image Processing, 2004. ICIP '04..

[13]  T. Kawamura,et al.  Lip shape extraction for word recognition by using hardware active contour model , 2004, Proceedings of 2004 International Symposium on Intelligent Multimedia, Video and Speech Processing, 2004..

[14]  Dennis E. Egan,et al.  Handbook of Human Computer Interaction , 1988 .

[15]  Rainer Lienhart,et al.  An extended set of Haar-like features for rapid object detection , 2002, Proceedings. International Conference on Image Processing.

[16]  Alan Wee-Chung Liew,et al.  Lip contour extraction using a deformable model , 2000, Proceedings 2000 International Conference on Image Processing (Cat. No.00CH37101).

[17]  William Buxton,et al.  Readings in human-computer interaction , 1987 .

[18]  Gihan Shin,et al.  Vision-Based Multimodal Human Computer Interface Based on Parallel Tracking of Eye and Hand Motion , 2007, 2007 International Conference on Convergence Information Technology (ICCIT 2007).

[19]  Jan Flusser,et al.  Rotation Moment Invariants for Recognition of Symmetric Objects , 2006, IEEE Transactions on Image Processing.

[20]  James K. Lein,et al.  Fundamentals of Image Processing , 2012 .

[21]  Andrew T Duchowski,et al.  A breadth-first survey of eye-tracking applications , 2002, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[22]  Ye-peng Guan,et al.  Automatic extraction of lips based on multi-scale wavelet edge detection , 2008 .

[23]  Ning Jin,et al.  Human motion analysis , 2007 .

[24]  Jan Flusser,et al.  On the independence of rotation moment invariants , 2000, Pattern Recognit..