Monocular Human Motion Tracking with Non-Connected Body Part Dependency

2D articulated human pose tracking in monocular image sequences remains an extremely challenging task due to background cluttering, variation in body appearance, occlusion and imaging conditions. Most of the current approaches only deal with simple appearance and adjacent or connected body part dependencies, especially the tree-structured priors assumed over body part connections. Such prior excludes the dependencies between non-connected body parts which could actually contribute to tracking accuracies. Building on the successful pictorial structures model, we propose a novel framework for human pose tracking including more dependencies of non-connected body parts. In order to implement inference efficiently for the proposed model, we introduce a factor graph to factorize all the unary term and all dependencies that are modelled in the pairwise term of the proposed model. In this paper, we also observe that the posterior marginals of each part from the tree structure model satisfy a Gaussian distribution. Based on this property, the sampling procedure becomes straight-forward and the search space can be shrunk effectively. We incorporate a simple motion constraint to capture the temporal continuity of body parts between frames, since the positions/orientations of body parts usually change smoothly between consecutive frames. In addition, we introduce a full body detector as the first step of our framework to reduce the search space for pose tracking. We also exploit the temporal continuity of body parts between frames by incorporating constraints on the location distance and the orientation difference for each body part between two successive frames. We evaluate our framework on two challenging image sequences and conduct a series of experiments to compare the performance with the approaches based on the tree-based model. The results illustrate that the proposed framework improves the performance significantly.

[1]  David A. Forsyth,et al.  Tracking People by Learning Their Appearance , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Brendan J. Frey,et al.  Factor graphs and the sum-product algorithm , 2001, IEEE Trans. Inf. Theory.

[4]  Joris M. Mooij,et al.  libDAI: A Free and Open Source C++ Library for Discrete Approximate Inference in Graphical Models , 2010, J. Mach. Learn. Res..

[5]  X. Jin Factor graphs and the Sum-Product Algorithm , 2002 .

[6]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[7]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[8]  Ling Li,et al.  A Robust Framework for 2D Human Pose Tracking with Spatial and Temporal Constraints , 2014, 2014 International Conference on Digital Image Computing: Techniques and Applications (DICTA).

[9]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[10]  Wanquan Liu,et al.  Multi-Scale Human Pose Tracking in 2D Monocular Images , 2014 .

[11]  Varun Ramakrishna,et al.  Tracking Human Pose by Tracking Symmetric Parts , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Shimon Ullman,et al.  Using Linking Features in Learning Non-parametric Part Models , 2012, ECCV.

[13]  Daniel P. Huttenlocher,et al.  Beyond trees: common-factor models for 2D human pose recovery , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[14]  Mark Everingham,et al.  Clustered Pose and Nonlinear Appearance Models for Human Pose Estimation , 2010, BMVC.

[15]  Jiri Matas,et al.  P-N learning: Bootstrapping binary classifiers by structural constraints , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Andrew Zisserman,et al.  Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[18]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[19]  Bernt Schiele,et al.  Discriminative Appearance Models for Pictorial Structures , 2011, International Journal of Computer Vision.

[20]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[21]  David A. Forsyth,et al.  Improved Human Parsing with a Full Relational Model , 2010, ECCV.

[22]  Yuandong Tian,et al.  Exploring the Spatial Hierarchy of Mixture Models for Human Pose Estimation , 2012, ECCV.

[23]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[24]  Luc Van Gool,et al.  Human Pose Estimation Using Body Parts Dependent Joint Regressors , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Ling Li,et al.  Human pose tracking based on both generic and specific appearance models , 2012, 2012 12th International Conference on Control Automation Robotics & Vision (ICARCV).

[26]  Adrian Hilton,et al.  Visual Analysis of Humans - Looking at People , 2013 .

[27]  Ling Li,et al.  Background Suppression for Building Accurate Appearance Models in Human Motion Tracking , 2012, 2012 International Conference on Digital Image Computing Techniques and Applications (DICTA).

[28]  Andrew Zisserman,et al.  2D Articulated Human Pose Estimation and Retrieval in (Almost) Unconstrained Still Images , 2012, International Journal of Computer Vision.