Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints

Due to occlusions, the estimation of the full pose of a human hand interacting with an object is much more challenging than pose recovery of a hand observed in isolation. In this work we formulate an optimization problem whose solution is the 26-DOF hand pose together with the pose and model parameters of the manipulated object. Optimization seeks for the joint hand-object model that (a) best explains the incompleteness of observations resulting from occlusions due to hand-object interaction and (b) is physically plausible in the sense that the hand does not share the same physical space with the object. The proposed method is the first that solves efficiently the continuous, full-DOF, joint hand-object tracking problem based solely on markerless multicamera input. Additionally, it is the first to demonstrate how hand-object interaction can be exploited as a context that facilitates hand pose estimation, instead of being considered as a complicating factor. Extensive quantitative and qualitative experiments with simulated and real world image sequences as well as a comparative evaluation with a state-of-the-art method for pose estimation of isolated hands, support the above findings.

[1]  HiltonAdrian,et al.  A survey of advances in vision-based human motion capture and analysis , 2006 .

[2]  David J. Fleet,et al.  Model-based hand tracking with texture, shading and self-occlusions , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Antonis A. Argyros,et al.  A GPU-powered Computational Framework for Efficient 3D Model-based Vision , 2011 .

[4]  Antonis A. Argyros,et al.  Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[5]  Michael I. Mandel,et al.  Visual Hand Tracking Using Nonparametric Belief Propagation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[6]  Michael J. Black,et al.  Measure Locally, Reason Globally: Occlusion-sensitive Articulated Pose Estimation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Mauro Birattari,et al.  Swarm Intelligence , 2012, Lecture Notes in Computer Science.

[8]  Fei-Fei Li,et al.  Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[9]  Danica Kragic,et al.  Hands in action: real-time 3D reconstruction of hands in interaction with objects , 2010, 2010 IEEE International Conference on Robotics and Automation.

[10]  Hans-Peter Seidel,et al.  Construction and animation of anatomically based human hand models , 2003, SCA '03.

[11]  山田 祐,et al.  Open Dynamics Engine を用いたスノーボードロボットシミュレータの開発 , 2007 .

[12]  Luc Van Gool,et al.  Tracking a hand manipulating an object , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[13]  Danica Kragic,et al.  Simultaneous Visual Recognition of Manipulation Actions and Manipulated Objects , 2008, ECCV.

[14]  Mubarak Shah,et al.  Automatically Tuning Background Subtraction Parameters using Particle Swarm Optimization , 2007, 2007 IEEE International Conference on Multimedia and Expo.

[15]  Paulo R. S. Mendonça,et al.  Model-based 3D tracking of an articulated hand , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[16]  A. Torralba,et al.  The role of context in object recognition , 2007, Trends in Cognitive Sciences.

[17]  Larry S. Davis,et al.  Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Adrian Hilton,et al.  A survey of advances in vision-based human motion capture and analysis , 2006, Comput. Vis. Image Underst..

[19]  Stan Sclaroff,et al.  3D hand pose reconstruction using specialized mappings , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[20]  Mircea Nicolescu,et al.  Vision-based hand pose estimation: A review , 2007, Comput. Vis. Image Underst..

[21]  Maurice Clerc,et al.  The particle swarm - explosion, stability, and convergence in a multidimensional complex space , 2002, IEEE Trans. Evol. Comput..

[22]  Manolis I. A. Lourakis,et al.  Real-Time Tracking of Multiple Skin-Colored Objects with a Possibly Moving Camera , 2004, ECCV.

[23]  Ying Wu,et al.  View-independent recognition of hand postures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[24]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Larry S. Davis,et al.  Constraint Integration for Efficient Multiview Pose Estimation with Self-Occlusions , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Stan Sclaroff,et al.  Estimating 3D hand pose from a cluttered image , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[27]  Danica Kragic,et al.  Monocular real-time 3D articulated hand pose estimation , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[28]  Takeo Kanade,et al.  Model-based tracking of self-occluding articulated objects , 1995, Proceedings of IEEE International Conference on Computer Vision.

[29]  Antonis A. Argyros,et al.  Markerless and Efficient 26-DOF Hand Pose Recovery , 2010, ACCV.