ON-LINE HEAD POSE ESTIMATION WITH BINOCULAR HAND-EYE ROBOT BASED ON EVOLUTIONARY MODEL-BASED MATCHING

This paper presents a method to estimate the 3D pose of a human's head using two images input from stereo cameras. The proposed method utilizes an evolutionary search technique of 1-Step genetic algorithm (1-Step GA) being improved to adapt for real-time recognition in dynamic images and a fitness evaluation based on a stereo model matching. Here, the head position and orientation are detected simultaneously using the brightness distribution and color information of the input images, evaluating the facial features, eyebrows and eyes. Moreover, to improve the dynamics of recognition, feedforward model-based matching is proposed for hand-eye visual servoing. The effectiveness of the method is shown by the experiments to compensate the motion of the hand-eye camera against the relative motion of the object in camera frame, having resulted in the robust recognition against the hand-eye motion. INTRODUCTION This work is motivated by our desire to establish a visual system for a patient robot that is used to evaluate an ability of the medical treatments of nurse students, as shown in Fig.1. It is important for nurse to pay attention to the condition of the patient during, e.g. injection, to sense tiny sign of patient’s state so as to avoid medical accidents. What is the most important for nurses is to check the patient’s face periodically and carefully to infer their inside conditions. To evaluate this nurse abilities, the patient robot have to contrarily track the nurse’s head pose, then the patient robot can judge whether the students can give their patient a good treatment. The behaviors of patient robot to position its head pose relative to the nurse’s to observe the nurse’s head pose and gazing direction of eyes is one of visual servoing to 3D pose. There is a variety of approaches for face representation of poses, and they can be classified into three general categories: feature-based, appearance-based, and model based. Feature-based approaches use local features like points, line segments, edges, or regions. The main idea of this method is to select a set of feature points, which are matched against the incoming video to update pose estimation. In a feature-based approach, estimation based on the relationship between human facial features [1], [2] relies heavily on the accuracy of the facial feature detection schemes. Detection of facial features is not accurate and often fails because it is affected by other parameters depending on identity, distance from the camera, facial expression, noise, illumination changes, and occlusion [3], [4]. Appearance-based approaches attempt to capture and define the face as a whole. The image is compared with various templates to determine which one most closely matches the image, resulting in wasting time to recognize. This approach has received lot of attention recently. In one technique [5], templates representing facial feature are used in determining head position and orientation. The image is compared with various templates to determine which template most closely matches the image, resulting in wasting time to recognize. On the other hand, Model-based approaches is to use a model to search a target object in the image, and the model is composed based on how the target object can be seen in the input image. For recognizing a face and detecting its pose, 3-D models of face are mapped on input images to estimate the head pose [6], [7]. The recognition method developed in our paper is in this category. An

[1]  Michel Dhome,et al.  Real time 3D template matching , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[2]  Mamoru Minami,et al.  Manipulator visual servoing and tracking of fish using a genetic algorithm , 1999 .

[3]  Roberto Cipolla,et al.  Visual tracking and control using Lie algebras , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[4]  H. Suzuki,et al.  Visual servoing to catch fish using global/local GA search , 2005, IEEE/ASME Transactions on Mechatronics.

[5]  Vincent Lepetit,et al.  Stable real-time 3D tracking using online and offline information , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Roberto Brunelli,et al.  Estimation of pose and illuminant direction for face processing , 1994, Image Vis. Comput..

[7]  William T. Freeman,et al.  Example-based head tracking , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[8]  Mamoru Minami,et al.  Real-time face detection using hybrid GA based on selective attention , 2004, 2004 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS) (IEEE Cat. No.04CH37566).

[9]  Amir Averbuch,et al.  Fast motion estimation using bidirectional gradient methods , 2004, IEEE Transactions on Image Processing.

[10]  Hongbin Zha,et al.  Cooperative manipulations based on genetic algorithms using contact information , 1995, Proceedings 1995 IEEE/RSJ International Conference on Intelligent Robots and Systems. Human Robot Interaction and Cooperative Robots.

[11]  David E. Goldberg,et al.  Genetic Algorithms in Search Optimization and Machine Learning , 1988 .

[12]  D. H. Mellor,et al.  Real time , 1981 .