Multi-Scale Human Pose Tracking in 2D Monocular Images

In this paper we address the problem of tracking human poses in multiple perspective scales in 2D monocular images/videos. In most state-of-the-art 2D tracking approaches, the issue of scale variation is rarely discussed. However in reality, videos often contain human motion with dynamically changed scales. In this paper we propose a tracking framework that can deal with this problem. A scale checking and adjusting algorithm is proposed to automatically adjust the perspective scales during the tracking process. Two metrics are proposed for detecting and adjusting the scale change. One metric is from the height value of the tracked target, which is suitable for some sequences where the tracked target is upright and with no limbs stretching. The other metric employed in this algorithm is more generic, which is invariant to motion types. It is the ratio between the pixel counts of the target silhouette and the detected bounding boxes of the target body. The proposed algorithm is tested on the publicly available datasets (HumanEva). The experimental results show that our method demonstrated higher accuracy and efficiency compared to state-of-the-art approaches.

[1]  Andrew Zisserman,et al.  Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Martin A. Fischler,et al.  The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[3]  Huosheng Hu,et al.  Human motion tracking for rehabilitation - A survey , 2008, Biomed. Signal Process. Control..

[4]  Rashid Ansari,et al.  Cyclic articulated human motion tracking by sequential ancestral simulation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[5]  David A. Forsyth,et al.  Tracking People by Learning Their Appearance , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Andrew Zisserman,et al.  2D Articulated Human Pose Estimation and Retrieval in (Almost) Unconstrained Still Images , 2012, International Journal of Computer Vision.

[7]  Michael J. Black,et al.  Automatic Detection and Tracking of Human Motion with a View-Based Representation , 2002, ECCV.

[8]  W. Eric L. Grimson,et al.  Adaptive background mixture models for real-time tracking , 1999, Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149).

[9]  Yee Whye Teh,et al.  Learning to Parse Images , 1999, NIPS.

[10]  Daniel P. Huttenlocher,et al.  Efficient matching of pictorial structures , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[11]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[12]  Jitendra Malik,et al.  Shape matching and object recognition using shape contexts , 2010, 2010 3rd International Conference on Computer Science and Information Technology.

[13]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[14]  Daniel P. Huttenlocher,et al.  Beyond trees: common-factor models for 2D human pose recovery , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[15]  David A. Forsyth,et al.  Finding and tracking people from the bottom up , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[16]  Jean-Christophe Nebel,et al.  Tracking Human Position and Lower Body Parts Using Kalman and Particle Filters Constrained by Human Biomechanics , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[17]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Ling Li,et al.  Human pose tracking based on both generic and specific appearance models , 2012, 2012 12th International Conference on Control Automation Robotics & Vision (ICARCV).

[19]  Bernt Schiele,et al.  Discriminative Appearance Models for Pictorial Structures , 2011, International Journal of Computer Vision.

[20]  P. KaewTrakulPong,et al.  An Improved Adaptive Background Mixture Model for Real-time Tracking with Shadow Detection , 2002 .

[21]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset and Baseline Algorithm for Evaluation of Articulated Human Motion , 2010, International Journal of Computer Vision.

[22]  Ronald Poppe,et al.  Vision-based human motion analysis: An overview , 2007, Comput. Vis. Image Underst..

[23]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.