Real-time genetic lips region detection and tracking in natural video scenes

In this paper, real-time detection and tracking of lips region of a talking person in natural scenes is addressed. In particular, we try to acquire numerical parameters to represent the lips information. Because, this information is very important for many applications, such as audio-visual speech recognition, robot perception, and interface of mobile devices. The difficulty lies in deformations and geometric change of lips, by speech and free camera work. Our proposed system is based on template matching with genetic algorithms (GAs). In our previous system, there is a trade-off between accuracy and a processing time. However, we can overcome this by two new methods: (a) a flexible control of a search domain, (b) inheritance of genetic information between video frames. We demonstrated the effectiveness of our proposed system by using some 5 seconds video sequences. The average results are that the accuracy is 94,44% and the processing time is 4.50 seconds

[1]  J. Gore,et al.  A comparison of bound and unbound audio-visual information processing in the human cerebral cortex. , 2002, Brain research. Cognitive brain research.

[2]  Lucia Ballerini Genetic snakes for medical image segmentation , 1998, Optics & Photonics.

[3]  Yi-Ting Huang,et al.  A novel method for detecting lips, eyes and faces in real time , 2003, Real Time Imaging.

[4]  Koji Iwano Bimodal speech recognition using lip movement measured by optical flow analysis , 2001 .

[5]  Yochai Konig,et al.  "Eigenlips" for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  R. Seguier,et al.  Multiobjectives genetic snakes: application on audio-visual speech recognition , 2003, Proceedings EC-VIP-MC 2003. 4th EURASIP Conference focused on Video/Image Processing and Multimedia Communications (IEEE Cat. No.03EX667).

[7]  Lionel Revéret From raw images of the lips to articulatory parameters: a viseme-based prediction , 1997, EUROSPEECH.

[8]  Juergen Luettin,et al.  Visual speech recognition using active shape models and hidden Markov models , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9]  Timothy F. Cootes,et al.  Facial feature detection using AdaBoost with shape constraints , 2003, BMVC.

[10]  Trent W. Lewis,et al.  Lip Feature Extraction Using Red Exclusion , 2000, VIP.

[11]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[12]  Lucia Ballerini Genetic Snakes for Medical Images Segmentation , 1999, EvoWorkshops.

[13]  L. Darrell Whitley,et al.  The GENITOR Algorithm and Selection Pressure: Why Rank-Based Allocation of Reproductive Trials is Best , 1989, ICGA.