A Robust Likelihood Function for 3D Human Pose Tracking

Recent works on 3D human pose tracking using unsupervised methods typically focus on improving the optimization framework to find a better maximum in the likelihood function (i.e., the tracker). In contrast, in this paper, we focus on improving the likelihood function, by making it more robust and less ambiguous, thus making the optimization task easier. In particular, we propose an exponential chamfer distance for model matching that is robust to small pose changes, and a part-based model that is better able to localize partially occluded and overlapping parts. Using a standard annealing particle filter and simple diffusion motion model, the proposed likelihood function obtains significantly lower error than other unsupervised tracking methods on the HumanEva dataset. Noting that the joint system of the tracker's body model is different than the joint system of the motion capture ground-truth model, we propose a novel method for transforming between the two joint systems. Applying this bias correction, our part-based likelihood obtains results equivalent to state-of-the-art supervised tracking methods.

[1]  David J. Fleet,et al.  This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE Gaussian Process Dynamical Model , 2007 .

[2]  Andrew M. Wallace,et al.  Behavioural Analysis with Movement Cluster Model for Concurrent Actions , 2011, EURASIP J. Image Video Process..

[3]  Hans-Peter Seidel,et al.  Optimization and Filtering for Human Motion Capture , 2010, International Journal of Computer Vision.

[4]  Ronald Poppe,et al.  Vision-based human motion analysis: An overview , 2007, Comput. Vis. Image Underst..

[5]  Rui Li,et al.  3D Human Motion Tracking with a Coordinated Mixture of Factor Analyzers , 2009, International Journal of Computer Vision.

[6]  Cristian Sminchisescu,et al.  Human3.6M: Large Scale Datasets and Predictive Methods for 3D Human Sensing in Natural Environments , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  David J. Fleet,et al.  3D People Tracking with Gaussian Process Dynamical Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[8]  Cristian Sminchisescu,et al.  Latent structured models for human pose estimation , 2011, 2011 International Conference on Computer Vision.

[9]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[10]  Cristian Sminchisescu,et al.  Discriminative density propagation for 3D human motion estimation , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Cristian Sminchisescu,et al.  Human Pose Estimation from Silhouettes - A Consistent Approach Using Distance Level Sets , 2002, WSCG.

[12]  Luc Van Gool,et al.  Coupled Action Recognition and Pose Estimation from Multiple Views , 2012, International Journal of Computer Vision.

[13]  S. Sclaroff,et al.  Tracking Human Body Pose on a Learned Smooth Space , 2005 .

[14]  Cristian Sminchisescu,et al.  Covariance scaled sampling for monocular 3D body tracking , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[15]  Radu Horaud,et al.  Human Motion Tracking by Registering an Articulated Surface to 3D Points and Normals , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Michael Isard,et al.  CONDENSATION—Conditional Density Propagation for Visual Tracking , 1998, International Journal of Computer Vision.

[17]  David J. Fleet,et al.  Stochastic Tracking of 3D Human Figures Using 2D Image Motion , 2000, ECCV.

[18]  Cristian Sminchisescu,et al.  Estimating Articulated Human Motion with Covariance Scaled Sampling , 2003, Int. J. Robotics Res..

[19]  Neil D. Lawrence,et al.  Gaussian Process Latent Variable Models for Visualisation of High Dimensional Data , 2003, NIPS.

[20]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[21]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  David J. Fleet,et al.  People tracking using hybrid Monte Carlo filtering , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[23]  Rómer Rosales,et al.  Learning Body Pose via Specialized Maps , 2001, NIPS.

[24]  Vijay John,et al.  Markerless human articulated tracking using hierarchical particle swarm optimisation , 2010, Image Vis. Comput..

[25]  David J. Fleet,et al.  Stochastic Tracking of 3 D Human Figures Using 2 D Image Motion , 2000 .

[26]  Michael Isard,et al.  Loose-limbed People: Estimating 3D Human Pose and Motion Using Non-parametric Belief Propagation , 2011, International Journal of Computer Vision.

[27]  Francesc Moreno-Noguer,et al.  Single image 3D human pose estimation from noisy observations , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  David A. Forsyth,et al.  Tracking People by Learning Their Appearance , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Baining Guo,et al.  Exemplar-based human action pose correction and tagging , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Ioannis A. Kakadiaris,et al.  Model-based estimation of 3D human motion with occlusion based on active multi-viewpoint selection , 1996, Proceedings CVPR IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Bernt Schiele,et al.  Monocular 3D pose estimation and tracking by detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[32]  Cristian Sminchisescu,et al.  Twin Gaussian Processes for Structured Prediction , 2010, International Journal of Computer Vision.

[33]  Trevor Darrell,et al.  Rank priors for continuous non-linear dimensionality reduction , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  James M. Rehg,et al.  A multiple hypothesis approach to figure tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[35]  Ronald Poppe,et al.  Evaluating Example-based Pose Estimation: Experiments on the HumanEva Sets , 2007 .

[36]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[37]  John F. Canny,et al.  A Computational Approach to Edge Detection , 1986, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  J. Tenenbaum,et al.  A global geometric framework for nonlinear dimensionality reduction. , 2000, Science.

[39]  Moritz Tenorth,et al.  The TUM Kitchen Data Set of everyday manipulation activities for motion tracking and action recognition , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[40]  Luc Van Gool,et al.  Learning Generative Models for Multi-Activity Body Pose Estimation , 2008, International Journal of Computer Vision.

[41]  Emiliano Gambaretto,et al.  Markerless Motion Capture through Visual Hull, Articulated ICP and Subject Specific Model Generation , 2010, International Journal of Computer Vision.

[42]  David J. Fleet,et al.  Dynamical binary latent variable models for 3D human pose tracking , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[43]  Ankur Agarwal,et al.  Recovering 3D human pose from monocular images , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Xianghua Xie,et al.  Tracking 3D human pose with large root node uncertainty , 2011, CVPR 2011.

[45]  Luc Van Gool,et al.  Full body tracking from multiple views using stochastic sampling , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[46]  Ian D. Reid,et al.  Articulated Body Motion Capture by Stochastic Search , 2005, International Journal of Computer Vision.

[47]  Michael J. Black,et al.  Learning the Statistics of People in Images and Video , 2003, International Journal of Computer Vision.

[48]  Michael J. Black,et al.  Combined discriminative and generative articulated pose and non-rigid shape estimation , 2007, NIPS.

[49]  Bodo Rosenhahn,et al.  Region-based pose tracking with occlusions using 3D models , 2010, Machine Vision and Applications.

[50]  Radu Horaud,et al.  Human Motion Tracking with a Kinematic Parameterization of Extremal Contours , 2007, International Journal of Computer Vision.

[51]  James M. Rehg,et al.  A Modular Approach to the Analysis and Evaluation of Particle Filters for Figure Tracking , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[52]  Cristian Sminchisescu,et al.  BM³E : Discriminative Density Propagation for Visual Tracking , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Neil D. Lawrence,et al.  Probabilistic Non-linear Principal Component Analysis with Gaussian Process Latent Variable Models , 2005, J. Mach. Learn. Res..

[54]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[55]  Michael Beetz,et al.  A Self-Training Approach for Visual Tracking and Recognition of Complex Human Activity Patterns , 2012, International Journal of Computer Vision.

[56]  Andrew W. Fitzgibbon,et al.  The Joint Manifold Model for Semi-supervised Multi-valued Regression , 2007, 2007 IEEE 11th International Conference on Computer Vision.