Breaking the Chain: Liberation from the Temporal Markov Assumption for Tracking Human Poses

We present an approach to multi-target tracking that has expressive potential beyond the capabilities of chain-shaped hidden Markov models, yet has significantly reduced complexity. Our framework, which we call tracking-by-selection, is similar to tracking-by-detection in that it separates the tasks of detection and tracking, but it shifts temporal reasoning from the tracking stage to the detection stage. The core feature of tracking-by-selection is that it reasons about path hypotheses that traverse the entire video instead of a chain of single-frame object hypotheses. A traditional chain-shaped tracking-by-detection model is only able to promote consistency between one frame and the next. In tracking-by-selection, path hypotheses exist across time, and encouraging long-term temporal consistency is as simple as rewarding path hypotheses with consistent image features. One additional advantage of tracking-by-selection is that it results in a dramatically simplified model that can be solved exactly. We adapt an existing tracking-by-detection model to the tracking-by-selection framework, and show improved performance on a challenging dataset.

[1]  Deva Ramanan,et al.  Robust Tracking of the Upper Limb for Functional Stroke Assessment , 2010, IEEE Transactions on Neural Systems and Rehabilitation Engineering.

[2]  Leonid Sigal,et al.  Human Context: Modeling Human-Human Interactions for Monocular 3D Pose Estimation , 2012, AMDO.

[3]  Stefan Roth,et al.  People-tracking-by-detection and people-detection-by-tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[5]  Ben Taskar,et al.  Cascaded Models for Articulated Pose Estimation , 2010, ECCV.

[6]  David J. Fleet,et al.  3D People Tracking with Gaussian Process Dynamical Models , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[8]  Vittorio Ferrari,et al.  Better Appearance Models for Pictorial Structures , 2009, BMVC.

[9]  Varun Ramakrishna,et al.  Tracking Human Pose by Tracking Symmetric Parts , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Luc Van Gool,et al.  Coupled Detection and Trajectory Estimation for Multi-Object Tracking , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  Silvio Savarese,et al.  Articulated part-based model for joint object detection and pose estimation , 2011, 2011 International Conference on Computer Vision.

[12]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, CVPR.

[13]  Thomas Hofmann,et al.  Support vector machine learning for interdependent and structured output spaces , 2004, ICML.

[14]  Afshin Dehghan,et al.  GMCP-Tracker: Global Multi-object Tracking Using Generalized Minimum Clique Graphs , 2012, ECCV.

[15]  Andrew W. Fitzgibbon,et al.  Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[16]  Daniel P. Huttenlocher,et al.  Beyond trees: common-factor models for 2D human pose recovery , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[17]  Luc Van Gool,et al.  Articulated Multi-body Tracking under Egomotion , 2008, ECCV.

[18]  Ben Taskar,et al.  Parsing human motion with stretchable models , 2011, CVPR 2011.

[19]  David A. Forsyth,et al.  Finding and tracking people from the bottom up , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[20]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[21]  Ben Taskar,et al.  Sidestepping Intractable Inference with Structured Ensemble Cascades , 2010, NIPS.

[22]  David A. Forsyth,et al.  Human Tracking with Mixtures of Trees , 2001, ICCV.

[23]  Silvio Savarese,et al.  An efficient branch-and-bound algorithm for optimal human pose estimation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Andrew Zisserman,et al.  Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Sidharth Bhatia,et al.  Tracking loose-limbed people , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[26]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.