Background Suppression for Building Accurate Appearance Models in Human Motion Tracking

This paper presents a robust and fully-automatic human motion tracking system without motion priors information using a camera in a fixed location. Bottom-up estimation approaches have recently been applied to such tasks with some success. However, the performance of these approaches is limited by the difficulty of building an effective appearance model. In particular, the appearance model must be derived from initial estimates of the tracked person's limb posture. However, in addition to inaccuracies in this initial estimate, the precise shape, size and boundaries of the tracked person's limbs are not known. Hence it is inevitable that background (non-limb) pixels are included into the appearance model. In the case of smaller limbs such as the arms, this can cause the model to become unrepresentative and sometimes confused with other body parts such as the torso. In this paper, we address the problem of how to automatically extract accurate training samples for building an accurate appearance model, and propose a mechanism for identifying and removing background (negative) pixels via pixel clustering that is robust even with a loose-fitting body shape model. Experiments are conducted to compare the proposed approach against existing appearance-based algorithms without negative pixel removal using several publicly available data sets. Results show that tracking accuracy is consistently improved, and significantly so for small limbs such as the arms.

[1]  Ling Li,et al.  Human pose tracking based on both generic and specific appearance models , 2012, 2012 12th International Conference on Control Automation Robotics & Vision (ICARCV).

[2]  Daniel P. Huttenlocher,et al.  Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[3]  Dorin Comaniciu,et al.  Mean Shift: A Robust Approach Toward Feature Space Analysis , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Jitendra Malik,et al.  Recovering human body configurations: combining segmentation and recognition , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[5]  Jitendra Malik,et al.  Recovering human body configurations using pairwise constraints between parts , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[6]  Deva Ramanan,et al.  Learning to parse images of articulated bodies , 2006, NIPS.

[7]  Andrew Zisserman,et al.  2D Human Pose Estimation in TV Shows , 2009, Statistical and Geometrical Approaches to Visual Motion Analysis.

[8]  Larry S. Davis,et al.  Robust Appearance Modeling for Pedestrian and Vehicle Tracking , 2006, CLEAR.

[9]  David A. Forsyth,et al.  Tracking People by Learning Their Appearance , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Michael J. Black,et al.  HumanEva: Synchronized Video and Motion Capture Dataset for Evaluation of Articulated Human Motion , 2006 .