Predictive Tracking in Vision-based Hand Pose Estimation using Unscented Kalman Filter and Multi-viewpoint Cameras

One of the major challenges in human-robot interaction is how to enable the use of unrestricted hand motion as a tool for communication. The direct use of hand as an input tool enables the user to connect to systems more naturally, allowing them to become an integral part of our daily lives. A vision-based approach, using cameras to capture data, supports non-contact and unrestricted movement of the hand. Nonetheless, the high degrees of freedom (DOF) of the hand is an essential issue to tackle in articulated hand motion tracking and pose estimation. In this paper, we present our vision-based model-based approach, which uses multiple cameras and predictive filtering, to estimate the pose of the hand. We build on the research of Ueda et al., whose work can separately estimate the global pose (wrist position and palm orientation) and the local pose (finger joint angles), but not simultaneously (Ueda et al., 2003). We address the problem through the use of a non-linear filter, Unscented Kalman Filter (UKF), to track the motion and simultaneously estimate the global and local poses of the hand. The rest of the paper is organized as follows. Section 2 presents the related works and Section 3 discusses the UKF. Section 4 explains the hand pose estimation system and Section 5 details how we use the UKF for tracking and pose estimation. Experimental results and discussions are found in Section 6.

[1]  Tsukasa Ogasawara,et al.  Model-based hand pose estimation using multiple viewpoint silhouette images and Unscented Kalman Filter , 2008, RO-MAN 2008 - The 17th IEEE International Symposium on Robot and Human Interactive Communication.

[2]  Richard Szeliski,et al.  Rapid octree construction from image sequences , 1993 .

[3]  Tsukasa Ogasawara,et al.  A hand-pose estimation for vision-based human interfaces , 2003, IEEE Trans. Ind. Electron..

[4]  Rajeev Sharma,et al.  Tracking hand dynamics in unconstrained environments , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[5]  Rudolph van der Merwe,et al.  The unscented Kalman filter for nonlinear estimation , 2000, Proceedings of the IEEE 2000 Adaptive Systems for Signal Processing, Communications, and Control Symposium (Cat. No.00EX373).

[6]  Paulo R. S. Mendonça,et al.  Model-based 3D tracking of an articulated hand , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[7]  Stan Sclaroff,et al.  Estimating 3D hand pose from a cluttered image , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[8]  Nicol N. Schraudolph,et al.  3D hand tracking by rapid stochastic gradient descent using a skinning model , 2004 .

[9]  Jeffrey K. Uhlmann,et al.  Reduced sigma point filters for the propagation of means and covariances through nonlinear transformations , 2002, Proceedings of the 2002 American Control Conference (IEEE Cat. No.CH37301).

[10]  Takeo Kanade,et al.  DigitEyes: vision-based hand tracking for human-computer interaction , 1994, Proceedings of 1994 IEEE Workshop on Motion of Non-rigid and Articulated Objects.

[11]  Chung-Lin Huang,et al.  A model-based hand gesture recognition system , 2001, Machine Vision and Applications.

[12]  Chung-Lin Huang,et al.  Model-based articulated hand motion tracking for gesture recognition , 1998, Image Vis. Comput..

[13]  Akira Utsumi,et al.  Multiple-hand-gesture tracking using multiple cameras , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[14]  Yoshiaki Shirai,et al.  Hand gesture estimation and model refinement using monocular camera-ambiguity limitation by inequality constraints , 1998, Proceedings Third IEEE International Conference on Automatic Face and Gesture Recognition.

[15]  Jr. J.J. LaViola,et al.  A comparison of unscented and extended Kalman filtering for estimating quaternion motion , 2003, Proceedings of the 2003 American Control Conference, 2003..

[16]  Björn Stenger,et al.  Hand Pose Estimation Using Hierarchical Detection , 2004, ECCV Workshop on HCI.

[17]  Gordon Cheng,et al.  Unconstrained Real-time Markerless Hand Tracking for Humanoid Interaction , 2006, 2006 6th IEEE-RAS International Conference on Humanoid Robots.

[18]  Yoshiaki Shirai,et al.  Real-time 3D hand posture estimation based on 2D appearance retrieval using monocular camera , 2001, Proceedings IEEE ICCV Workshop on Recognition, Analysis, and Tracking of Faces and Gestures in Real-Time Systems.

[19]  Björn Stenger,et al.  Learning a Kinematic Prior for Tree-Based Filtering , 2003, BMVC.

[20]  Tsukasa Ogasawara,et al.  Hand pose estimation using voxel-based individualized hand model , 2009, 2009 IEEE/ASME International Conference on Advanced Intelligent Mechatronics.

[21]  Olivier D. Faugeras,et al.  3D articulated models and multi-view tracking with silhouettes , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[22]  Jeffrey K. Uhlmann,et al.  New extension of the Kalman filter to nonlinear systems , 1997, Defense, Security, and Sensing.

[23]  Mircea Nicolescu,et al.  Vision-based hand pose estimation: A review , 2007, Comput. Vis. Image Underst..

[24]  C. W. Chan,et al.  Performance evaluation of UKF-based nonlinear filtering , 2006, Autom..

[25]  Vladimir Pavlovic,et al.  Visual Interpretation of Hand Gestures for Human-Computer Interaction: A Review , 1997, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Ying Wu,et al.  Capturing natural hand articulation , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.