Capturing Hands in Action Using Discriminative Salient Points and Physics Simulation

Hand motion capture is a popular research field, recently gaining more attention due to the ubiquity of RGB-D sensors. However, even most recent approaches focus on the case of a single isolated hand. In this work, we focus on hands that interact with other hands or objects and present a framework that successfully captures motion in such interaction scenarios for both rigid and articulated objects. Our framework combines a generative model with discriminatively trained salient points to achieve a low tracking error and with collision detection and physics simulation to achieve physically plausible estimates even in case of occlusions and missing visual data. Since all components are unified in a single objective function which is almost everywhere differentiable, it can be optimized with standard optimization techniques. Our approach works for monocular RGB-D sequences as well as setups with multiple synchronized RGB cameras. For a qualitative and quantitative evaluation, we captured 29 sequences with a large variety of interactions and up to 150 degrees of freedom.

[1]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Marc Levoy,et al.  Real-time 3D model acquisition , 2002, ACM Trans. Graph..

[3]  Frédo Durand,et al.  A Fast Approximation of the Bilateral Filter Using a Signal Processing Approach , 2006, International Journal of Computer Vision.

[4]  Tae-Kyun Kim,et al.  Latent Regression Forest: Structured Estimation of 3D Articulated Hand Posture , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Luc Van Gool,et al.  Tracking a hand manipulating an object , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[6]  Hans-Peter Seidel,et al.  Real-Time Hand Tracking Using a Sum of Anisotropic Gaussians Model , 2014, 2014 2nd International Conference on 3D Vision.

[7]  Jorge Stolfi,et al.  Oriented projective geometry , 1987, SCG '87.

[8]  Luc Van Gool,et al.  Hough Forests for Object Detection, Tracking, and Action Recognition , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Olivier D. Faugeras,et al.  3D Articulated Models and Multiview Tracking with Physical Forces , 2001, Comput. Vis. Image Underst..

[10]  Antonis A. Argyros,et al.  Full DOF tracking of a hand interacting with an object by modeling occlusions and physical constraints , 2011, 2011 International Conference on Computer Vision.

[11]  Antonis A. Argyros,et al.  Scalable 3D Tracking of Multiple Interacting Objects , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Antonis A. Argyros,et al.  Efficient model-based 3D tracking of hand articulations using Kinect , 2011, BMVC.

[13]  Dieter Fox,et al.  DART: Dense Articulated Real-Time Tracking , 2014, Robotics: Science and Systems.

[14]  Jorge Stolfi,et al.  Oriented Projective Geometry: A Framework for Geometric Computations , 2014 .

[15]  Luc Van Gool,et al.  Functional categorization of objects using real-time markerless motion capture , 2011, CVPR 2011.

[16]  Paulo R. S. Mendonça,et al.  Model-based 3D tracking of an articulated hand , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[17]  Mircea Nicolescu,et al.  Vision-based hand pose estimation: A review , 2007, Comput. Vis. Image Underst..

[18]  Danica Kragic,et al.  Monocular real-time 3D articulated hand pose estimation , 2009, 2009 9th IEEE-RAS International Conference on Humanoid Robots.

[19]  Antonis A. Argyros,et al.  Physically Plausible 3D Scene Tracking: The Single Actor Hypothesis , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Luca Ballan,et al.  Marker-less motion capture of skinned models in a four camera set-up using optical flow and silhouettes , 2008 .

[21]  Toby Sharp,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR.

[22]  Ying Wu,et al.  Capturing natural hand articulation , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[23]  Marc Levoy,et al.  Efficient variants of the ICP algorithm , 2001, Proceedings Third International Conference on 3-D Digital Imaging and Modeling.

[24]  Takeo Kanade,et al.  Model-based tracking of self-occluding articulated objects , 1995, Proceedings of IEEE International Conference on Computer Vision.

[25]  Ken Perlin,et al.  Real-Time Continuous Pose Recovery of Human Hands Using Convolutional Networks , 2014, ACM Trans. Graph..

[26]  Lale Akarun,et al.  Hand Pose Estimation and Hand Shape Classification Using Multi-layered Randomized Decision Forests , 2012, ECCV.

[27]  David J. Fleet,et al.  Model-Based 3D Hand Pose Estimation from Monocular Video , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Varun Ramakrishna,et al.  User-Specific Hand Modeling from Monocular Depth Sequences , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  John P. Lewis,et al.  Pose Space Deformation: A Unified Approach to Shape Interpolation and Skeleton-Driven Deformation , 2000, SIGGRAPH.

[30]  Danica Kragic,et al.  Hands in action: real-time 3D reconstruction of hands in interaction with objects , 2010, 2010 IEEE International Conference on Robotics and Automation.

[31]  Hans-Peter Seidel,et al.  Eurographics/siggraph Symposium on Computer Animation (2003) Construction and Animation of Anatomically Based Human Hand Models , 2022 .

[32]  David W. Murray,et al.  Regression-based Hand Pose Estimation from Multiple Cameras , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[33]  Daniel P. Huttenlocher,et al.  Distance Transforms of Sampled Functions , 2012, Theory Comput..

[34]  Andrew W. Fitzgibbon,et al.  Accurate, Robust, and Flexible Real-time Hand Tracking , 2015, CHI.

[35]  Bodo Rosenhahn,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence Combined Region-and Motion-based 3d Tracking of Rigid and Articulated Objects , 2022 .

[36]  Shan Lu,et al.  Using multiple cues for hand tracking and model refinement , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[37]  Tae-Kyun Kim,et al.  Real-Time Articulated Hand Pose Estimation Using Semi-supervised Transductive Regression Forests , 2013, 2013 IEEE International Conference on Computer Vision.

[38]  Chen Qian,et al.  Realtime and Robust Hand Tracking from Depth , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[39]  Bodo Rosenhahn,et al.  Three-Dimensional Shape Knowledge for Joint Image Segmentation and Pose Tracking , 2007, International Journal of Computer Vision.

[40]  Antonis A. Argyros,et al.  Tracking the articulated motion of two strongly interacting hands , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Jovan Popović,et al.  Real-time hand-tracking with a color glove , 2009, SIGGRAPH 2009.

[42]  Danica Kragic,et al.  Grasp Recognition for Programming by Demonstration , 2005, Proceedings of the 2005 IEEE International Conference on Robotics and Automation.

[43]  Jitendra Malik,et al.  Twist Based Acquisition and Tracking of Animal and Human Kinematics , 2004, International Journal of Computer Vision.

[44]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[45]  Gérard G. Medioni,et al.  Object modeling by registration of multiple range images , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[46]  Luc Van Gool,et al.  An object-dependent hand pose prior from sparse training data , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[47]  Gabriel Zachmann,et al.  Collision Detection for Deformable Objects , 2004, Comput. Graph. Forum.

[48]  Antti Oulasvirta,et al.  Fast and robust hand tracking using detection-guided optimization , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Manolis I. A. Lourakis,et al.  Evolutionary Quasi-Random Search for Hand Articulations Tracking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[50]  Stan Sclaroff,et al.  Estimating 3D hand pose from a cluttered image , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[51]  Mohammad Ali Nekouie 3D Human Hand Posture Reconstruction Using a Single 2D Image , 2011 .

[52]  James M. Rehg,et al.  Statistical Color Models with Application to Skin Detection , 2004, International Journal of Computer Vision.

[53]  Björn Stenger,et al.  Shape context and chamfer matching in cluttered scenes , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[54]  Luc Van Gool,et al.  Smart particle filtering for high-dimensional tracking , 2007, Comput. Vis. Image Underst..

[55]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[56]  Jovan Popovic,et al.  Automatic rigging and animation of 3D characters , 2007, ACM Trans. Graph..

[57]  David C. Hogg,et al.  Towards 3D hand tracking using a deformable model , 1996, Proceedings of the Second International Conference on Automatic Face and Gesture Recognition.

[58]  Nassir Navab,et al.  Adaptive neighborhood selection for real-time surface normal estimation from organized point cloud data using integral images , 2012, 2012 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[59]  Bodo Rosenhahn,et al.  Model-Based Pose Estimation , 2011, Visual Analysis of Humans.

[60]  Takeo Kanade,et al.  Visual Tracking of High DOF Articulated Structures: an Application to Human Hand Tracking , 1994, ECCV.

[61]  Bernd Gärtner,et al.  An efficient, exact, and generic quadratic programming solver for geometric optimization , 2000, SCG '00.

[62]  Rómer Rosales,et al.  3D Hand Pose Reconstruction Using Specialized Mappings , 2001, ICCV.

[63]  Patrick Olivier,et al.  Digits: freehand 3D interactions anywhere using a wrist-worn gloveless sensor , 2012, UIST.

[64]  Dimitrios Tzionas,et al.  A Comparison of Directional Distances for Hand Pose Estimation , 2013, GCPR.

[65]  Alok Aggarwal,et al.  Geometric applications of a matrix-searching algorithm , 1987, SCG '86.

[66]  Michael Isard,et al.  Partitioned Sampling, Articulated Objects, and Interface-Quality Hand Tracking , 2000, ECCV.

[67]  Antti Oulasvirta,et al.  Interactive Markerless Articulated Hand Motion Tracking Using RGB and Depth Data , 2013, 2013 IEEE International Conference on Computer Vision.

[68]  Dimitrios Tzionas,et al.  Capturing Hand Motion with an RGB-D Sensor, Fusing a Generative Model with Salient Points , 2014, GCPR.

[69]  Richard M. Murray,et al.  A Mathematical Introduction to Robotic Manipulation , 1994 .

[70]  Luc Van Gool,et al.  Motion Capture of Hands in Action Using Discriminative Salient Points , 2012, ECCV.

[71]  Michael I. Mandel,et al.  Visual Hand Tracking Using Nonparametric Belief Propagation , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[72]  Qing Zhang,et al.  A Survey on Human Motion Analysis from Depth Data , 2013, Time-of-Flight and Depth Imaging.