Recognition by detection: Perceiving human motion through part-configured feature maps

Visually perceiving human motion at semantic level is an important however challenging problem in multimedia area. In this work, we propose a novel approach to map the low-level responses from visual detection to semantically sensitive description to human actions. The feature map is triggered by the output of deformable part model detection, in which the critical information about body parts configuration is contained implicitly under the specific human actions. We map the filter responses of the detectors to an effective feature description, which encodes the position and appearance information of the root and every body parts simultaneously. Statistically, the obtained feature map captures the significance of relative configuration of body parts, therefore is robust to the false detections occurred in the individual part detectors. We conduct comprehensive experiments and the results show that the method generates discriminative action features and achieves remarkable performance in most of the cases.

[1]  Yun Fu,et al.  Exploring discriminative pose sub-patterns for effective action classification , 2013, ACM Multimedia.

[2]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[3]  Luc Van Gool,et al.  A mobile vision system for robust multi-person tracking , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[6]  J.K. Aggarwal,et al.  Human activity analysis , 2011, ACM Comput. Surv..

[7]  Cordelia Schmid,et al.  Expanded Parts Model for Human Attribute and Action Recognition in Still Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Deva Ramanan,et al.  Detecting Actions, Poses, and Objects with Relational Phraselets , 2012, ECCV.

[9]  Bohyung Han,et al.  Extracting Moving People from Internet Videos , 2008, ECCV.

[10]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Nazli Ikizler-Cinbis,et al.  Learning actions from the Web , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[12]  Subhransu Maji,et al.  Action recognition from a distributed representation of pose and appearance , 2011, CVPR 2011.

[13]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Researchers , 2007 .

[14]  Ivan Laptev,et al.  Recognizing human actions in still images: a study of bag-of-features and part-based representations , 2010, BMVC.

[15]  Gaurav Sharma,et al.  Learning discriminative spatial representation for image classification , 2011, BMVC.

[16]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.