High speed genetic lips detection by dynamic search domain control (特集:システム制御のための画像応用技術)

In this paper, high-speed size and orientation invariant lips detection of a talking person in an active scene using template matching and genetic algorithms is proposed. As part of the objectives, we also try to acquire numerical parameters to represent the lips. The information is very important for many applications, where high performance is required, such as audio-visual speech recognition, speaker identification systems, robot perception and personal mobile devices interfaces. The difficulty in lips detection is mainly due to deformations and geometric changes of the lips during speech and the active scene by free camera motion. In order to enhance the performance in speed and accuracy, initially, the performance is improved on a single still image, that is, the base of video processing. Our proposed system is based on template matching using genetic algorithms (GA). Only one template is prepared per experiment. The template is the closed mouth of a subject, because the application is for personal devices. In our previous study, the main problem was trade-off between search accuracy and search speed. To overcome this problem, we use two methods: scaling window and dynamic search domain control (SD-Control). We therefore focus on the population size of the GA, because it has a direct effect on search accuracy and speed. The effectiveness of the proposed system is demonstrated by performing computer simulations. We achieved a lips detection accuracy of 91.33% at an average processing time of 33.70 milliseconds per frame.

[1]  Patrice Delmas,et al.  Automatic lip tracking: Bayesian segmentation and active contours in a cooperative scheme , 1999, Proceedings IEEE International Conference on Multimedia Computing and Systems.

[2]  Andreas Birk,et al.  Roboguard, A Teleoperated Mobile Security Robot , 2001 .

[3]  Lionel Revéret From raw images of the lips to articulatory parameters: a viseme-based prediction , 1997, EUROSPEECH.

[4]  Juergen Luettin,et al.  Visual speech recognition using active shape models and hidden Markov models , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[5]  J. David Schaffer,et al.  Representation and Hidden Bias: Gray vs. Binary Coding for Genetic Algorithms , 1988, ML.

[6]  K. Plataniotis,et al.  Color Image Processing and Applications , 2000 .

[7]  Demetri Terzopoulos,et al.  Snakes: Active contour models , 2004, International Journal of Computer Vision.

[8]  Yasue Mitsukura,et al.  Genetic Lips Extraction Method for Varying Shape , 2004 .

[9]  James M. Rehg,et al.  A Compilation Framework for Power and Energy Management on Mobile Computers , 2001, LCPC.

[10]  H. McGurk,et al.  Hearing lips and seeing voices , 1976, Nature.

[11]  Hugh M Gloster,et al.  The use of second-intention healing for partial-thickness Mohs defects involving the vermilion and/or mucosal surfaces of the lip. , 2002, Journal of the American Academy of Dermatology.

[12]  L. Darrell Whitley,et al.  The GENITOR Algorithm and Selection Pressure: Why Rank-Based Allocation of Reproductive Trials is Best , 1989, ICGA.

[13]  Koji Iwano Bimodal speech recognition using lip movement measured by optical flow analysis , 2001 .

[14]  Juergen Luettin,et al.  Speaker identification by lipreading , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[15]  Yi-Ting Huang,et al.  A novel method for detecting lips, eyes and faces in real time , 2003, Real Time Imaging.

[16]  Dario Floreano,et al.  Evolving Vision-Based Flying Robots , 2002, Biologically Motivated Computer Vision.

[17]  Quing Zhu,et al.  Quantifying labial blood flow using optical Doppler tomography. , 2004, Oral surgery, oral medicine, oral pathology, oral radiology, and endodontics.

[18]  R. Seguier,et al.  Multiobjectives genetic snakes: application on audio-visual speech recognition , 2003, Proceedings EC-VIP-MC 2003. 4th EURASIP Conference focused on Video/Image Processing and Multimedia Communications (IEEE Cat. No.03EX667).

[19]  James E. Baker,et al.  Adaptive Selection Methods for Genetic Algorithms , 1985, International Conference on Genetic Algorithms.

[20]  J. Gore,et al.  A comparison of bound and unbound audio-visual information processing in the human cerebral cortex. , 2002, Brain research. Cognitive brain research.

[21]  Lucia Ballerini Genetic Snakes for Medical Images Segmentation , 1999, EvoWorkshops.

[22]  Sabine Windmann,et al.  Effects of Sentence Context and Expectation on the McGurk Illusion. , 2004 .

[23]  Timothy F. Cootes,et al.  Facial feature detection using AdaBoost with shape constraints , 2003, BMVC.

[24]  Sridha Sridharan,et al.  Adaptive Fusion of Speech and Lip Information for Robust Speaker Identification , 2001, Digit. Signal Process..

[25]  Yochai Konig,et al.  "Eigenlips" for robust speech recognition , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[26]  Trent W. Lewis,et al.  Lip Feature Extraction Using Red Exclusion , 2000, VIP.

[27]  Paul Y. Oh,et al.  An aerial robot prototype for situational awareness in closed quarters , 2003, Proceedings 2003 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2003) (Cat. No.03CH37453).