Scale and Rotation Invariant Approach to Tracking Human Body Part Regions in Videos

We propose a novel scale and rotation invariant method to track a human subject's body part regions in cluttered videos. The proposed method optimizes the assembly of body part region proposals with the spatial and temporal constraints of a human body plan. This approach is invariant to the object scale and rotation changes. To enable scale and rotation invariance, the human body part graph of the proposed method has to be loopy, efficiently optimizing the body part region assembly is a great challenge. We propose a dynamic programming method to solve the problem. We devise a method that finds N-best whole body configurations from loopy structures in each video frame using dynamic programming. The N-best configurations are then used to construct trellises with which we track human body part regions by finding shortest paths on the trellises. Our experiments on a variety of videos show that the proposed method is efficient, accurate and robust against object appearance variations, scale and rotation changes and background clutter.

[1]  Derek Hoiem,et al.  Category Independent Object Proposals , 2010, ECCV.

[2]  Haibin Ling,et al.  Shape Classification Using the Inner-Distance , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[4]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Jianbo Shi,et al.  Bottom-up Recognition and Parsing of the Human Body , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Michael J. Black,et al.  Measure Locally, Reason Globally: Occlusion-sensitive Articulated Pose Estimation , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[7]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Hao Jiang,et al.  Human Pose Estimation Using Consistent Max Covering , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Yong Jae Lee,et al.  Key-segments for video object segmentation , 2011, 2011 International Conference on Computer Vision.

[10]  Deva Ramanan,et al.  N-best maximal decoders for part models , 2011, 2011 International Conference on Computer Vision.

[11]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[12]  智一 吉田,et al.  Efficient Graph-Based Image Segmentationを用いた圃場図自動作成手法の検討 , 2014 .

[13]  Hao Jiang,et al.  Finding People Using Scale, Rotation and Articulation Invariant Matching , 2012, ECCV.

[14]  Yi Yang,et al.  Articulated pose estimation with flexible mixtures-of-parts , 2011, CVPR 2011.

[15]  James M. Rehg,et al.  Motion Coherent Tracking with Multi-label MRF optimization , 2010, BMVC.

[16]  Ben Taskar,et al.  Parsing human motion with stretchable models , 2011, CVPR 2011.

[17]  Charless C. Fowlkes,et al.  Shape-based pedestrian parsing , 2011, CVPR 2011.

[18]  David A. Forsyth,et al.  Tracking People by Learning Their Appearance , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion , 2006 .