论文信息 - Human-Centric Image Categorization Based on Poselets

Human-Centric Image Categorization Based on Poselets

In daily life, one kind of images are common, in which many people are present and performing certain activities. We call these images human-centric images. As the number of such images gets increasingly larger, to organize and access them efficiently becomes urgent. Since the categories of human-centric images are determined by human activities in the images, in this paper, we propose to classify human-centric images by analyzing poses of all humans in them. Specifically, first, we introduce the notion of poselets, which represent parts of poses of humans and a method to detect human based on the poselets. Given a human-centric image, to determine its category, we use the poselets and the human detection method to detect all possible poselet activations in it and create a statistical representation of the poses of humans in the image. Additionally, we also investigated the influence of contextual information on the categorization of human-centric images. Finally, for evaluating the human-centric image categorization method, five categories of human-centric images are collected from the internet and used for experiments. Experiment results show that the poselet distribution representations are more suitable for representing human-centric images than the popular bag of visual words method.

Shuang Bai

[1] Subhransu Maji,et al. Detecting People Using Mutually Consistent Poselet Activations , 2010, ECCV.

[2] Cordelia Schmid,et al. Actions in context , 2009, CVPR.

[3] Yang Wang,et al. Recognizing human actions from still images with latent poses , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4] David G. Lowe,et al. Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[5] Florent Perronnin,et al. Universal and Adapted Vocabularies for Generic Visual Categorization , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6] Cordelia Schmid,et al. Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[7] Chih-Jen Lin,et al. A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[8] Subhransu Maji,et al. Action recognition from a distributed representation of pose and appearance , 2011, CVPR 2011.

[9] Tal Hassner,et al. Motion Interchange Patterns for Action Recognition in Unconstrained Videos , 2012, ECCV.

[10] Shaogang Gong,et al. Discriminative Topics Modelling for Action Feature Selection and Recognition , 2010, BMVC.

[11] Antonio Torralba,et al. Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[12] Fei-Fei Li,et al. Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13] Rama Chellappa,et al. Machine Recognition of Human Activities: A Survey , 2008, IEEE Transactions on Circuits and Systems for Video Technology.

[14] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15] Pinar Duygulu Sahin,et al. Histogram of oriented rectangles: A new pose descriptor for human action recognition , 2009, Image Vis. Comput..

[16] Daniel P. Huttenlocher,et al. Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[17] Cordelia Schmid,et al. Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[18] Jitendra Malik,et al. Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[19] Michael G. Strintzis,et al. A comparative study of object-level spatial context techniques for semantic image analysis , 2011, Comput. Vis. Image Underst..

[20] David G. Lowe,et al. Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[21] Robert Marti,et al. Which is the best way to organize/classify images by content? , 2007, Image Vis. Comput..

[22] Subhransu Maji,et al. Describing people: A poselet-based approach to attribute classification , 2011, 2011 International Conference on Computer Vision.

[23] David A. Forsyth,et al. Finding and tracking people from the bottom up , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[24] Gabriela Csurka,et al. Visual categorization with bags of keypoints , 2002, eccv 2004.

[25] Antonio Fernández-Caballero,et al. A survey of video datasets for human action and activity recognition , 2013, Comput. Vis. Image Underst..

[26] Vincent Charvillat,et al. Context modeling in computer vision: techniques, implications, and applications , 2010, Multimedia Tools and Applications.