Learning human pose in crowd

In a crowded public space, body and head pose can provide useful information for understanding human behaviours and intentions. In this paper, we propose a novel framework for locating people and inferring their body and head poses. Human detection and pose estimation are two closely related problems but have been tackled independently in previous studies. In this work, we advocate joint detection and recognition of both head and body poses. Our framework is based on learning an ensemble of pose-sensitive human body models whose outputs provide a new representation for poses. To avoid tedious and inconsistent manual annotation for learning pose-sensitive models, we formulate a semi-supervised learning method for model training which bootstraps an initial model using a small set of labelled data, and subsequently improves the model iteratively by data mining from a large unlabelled dataset. Experiments using data from a busy underground station demonstrate that the proposed method significantly outperforms a state-of-the-art person detector and is able to yield extremely accurate head and body pose estimation in crowded public spaces.

[1]  Hao Jiang,et al.  Human Pose Estimation Using Consistent Max Covering , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  i-LIDS Team,et al.  Imagery Library for Intelligent Detection Systems (i-LIDS); A Standard for Testing Video Based Detection Systems , 2006, Proceedings 40th Annual 2006 International Carnahan Conference on Security Technology.

[3]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[4]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[5]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[6]  Pascal Fua,et al.  Joint pose estimator and feature learning for object detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[7]  Stan Sclaroff,et al.  Multiplicative kernels: Object detection, segmentation and pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Bernt Schiele,et al.  Pictorial structures revisited: People detection and articulated pose estimation , 2009, CVPR.

[9]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[10]  Shaogang Gong,et al.  Head Pose Classification in Crowded Scenes , 2009, BMVC.

[11]  Mohan M. Trivedi,et al.  Head Pose Estimation in Computer Vision: A Survey , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Christoph H. Lampert,et al.  Learning to Localize Objects with Structured Output Regression , 2008, ECCV.

[13]  Dorin Comaniciu,et al.  Joint Real-time Object Detection and Pose Estimation Using Probabilistic Boosting Network , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  P. Fua,et al.  Pose estimation for category specific multiview object localization , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Peng Li,et al.  Patch-based within-object classification , 2009, 2009 IEEE 12th International Conference on Computer Vision.