Drift-free tracking of rigid and articulated objects

Model-based 3D tracker estimate the position, rotation, and joint angles of a given model from video data of one or multiple cameras. They often rely on image features that are tracked over time but the accumulation of small errors results in a drift away from the target object. In this work, we address the drift problem for the challenging task of human motion capture and tracking in the presence of multiple moving objects where the error accumulation becomes even more problematic due to occlusions. To this end, we propose an analysis-by-synthesis framework for articulated models. It combines the complementary concepts of patch-based and region-based matching to track both structured and homogeneous body parts. The performance of our method is demonstrated for rigid bodies, body parts, and full human bodies where the sequences contain fast movements, self-occlusions, multiple moving objects, and clutter. We also provide a quantitative error analysis and comparison with other model-based approaches.

[1]  Larry S. Davis,et al.  3-D model-based tracking of humans in action: a multi-view approach , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[2]  Zhengyou Zhang,et al.  Iterative point matching for registration of free-form curves and surfaces , 1994, International Journal of Computer Vision.

[3]  Hans-Peter Seidel,et al.  Combining 3D flow fields with silhouette-based human motion capture for immersive video , 2004, Graph. Model..

[4]  Dimitris N. Metaxas,et al.  Optical Flow Constraints on Deformable Models with Applications to Face Tracking , 2000, International Journal of Computer Vision.

[5]  John F. Hughes Oriented Projective Geometry: A Framework for Geometric Computations (Jorge Stolfi) , 1993, SIAM Rev..

[6]  Pertti Roivainen,et al.  3-D Motion Estimation in Model-Based Facial Image Coding , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[8]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[9]  Hans-Peter Seidel,et al.  High Accuracy Optical Flow Serves 3-D Pose Tracking: Exploiting Contour and Flow Based Constraints , 2006, ECCV.

[10]  Bodo Rosenhahn,et al.  Three-Dimensional Shape Knowledge for Joint Image Segmentation and Pose Tracking , 2007, International Journal of Computer Vision.

[11]  Marco La Cascia,et al.  Fast, Reliable Head Tracking under Varying Illumination: An Approach Based on Registration of Texture-Mapped 3D Models , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[12]  Christian Rössl,et al.  Discrete tensorial quasi-harmonic maps , 2005, International Conference on Shape Modeling and Applications 2005 (SMI' 05).

[13]  Hans-Peter Seidel,et al.  Clustered Stochastic Optimization for Object Recognition and Pose Estimation , 2007, DAGM-Symposium.

[14]  Jitendra Malik,et al.  Twist Based Acquisition and Tracking of Animal and Human Kinematics , 2004, International Journal of Computer Vision.

[15]  Vincent Lepetit,et al.  Point matching as a classification problem for fast and robust object pose estimation , 2004, CVPR 2004.

[16]  Thomas Brox,et al.  Universität Des Saarlandes Fachrichtung 6.1 – Mathematik Highly Accurate Optic Flow Computation with Theoretically Justified Warping Highly Accurate Optic Flow Computation with Theoretically Justified Warping , 2022 .

[17]  Edward H. Adelson,et al.  A multiresolution spline with application to image mosaics , 1983, TOGS.

[18]  Stephen J. Sangwine,et al.  Hypercomplex auto- and cross-correlation of color images , 1999, Proceedings 1999 International Conference on Image Processing (Cat. 99CH36348).

[19]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion , 2006 .

[20]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[21]  Vincent Lepetit,et al.  Randomized trees for real-time keypoint recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[22]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[23]  Bodo Rosenhahn,et al.  Nonparametric Density Estimation for Human Pose Tracking , 2006, DAGM-Symposium.

[24]  Pushmeet Kohli,et al.  PoseCut: Simultaneous Segmentation and 3D Pose Estimation of Humans Using Dynamic Graph-Cuts , 2006, ECCV.

[25]  Takeo Kanade,et al.  An Iterative Image Registration Technique with an Application to Stereo Vision , 1981, IJCAI.

[26]  Michael J. Black,et al.  An Adaptive Appearance Model Approach for Model-based Articulated Object Tracking , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[27]  Carlo Tomasi,et al.  Good features to track , 1994, 1994 Proceedings of IEEE Conference on Computer Vision and Pattern Recognition.