论文信息 - Semantic Pyramids for Gender and Action Recognition

Semantic Pyramids for Gender and Action Recognition

Person description is a challenging problem in computer vision. We investigated two major aspects of person description: 1) gender and 2) action recognition in still images. Most state-of-the-art approaches for gender and action recognition rely on the description of a single body part, such as face or full-body. However, relying on a single body part is suboptimal due to significant variations in scale, viewpoint, and pose in real-world images. This paper proposes a semantic pyramid approach for pose normalization. Our approach is fully automatic and based on combining information from full-body, upper-body, and face regions for gender and action recognition in still images. The proposed approach does not require any annotations for upper-body and face of a person. Instead, we rely on pretrained state-of-the-art upper-body and face detectors to automatically extract semantic information of a person. Given multiple bounding boxes from each body part detector, we then propose a simple method to select the best candidate bounding box, which is used for feature extraction. Finally, the extracted features from the full-body, upper-body, and face regions are combined into a single representation for classification. To validate the proposed approach for gender recognition, experiments are performed on three large data sets namely: 1) human attribute; 2) head-shoulder; and 3) proxemics. For action recognition, we perform experiments on four data sets most used for benchmarking action recognition in still images: 1) Sports; 2) Willow; 3) PASCAL VOC 2010; and 4) Stanford-40. Our experiments clearly demonstrate that the proposed approach, despite its simplicity, outperforms state-of-the-art methods for gender and action recognition.

[1] Fei-Fei Li,et al. Recognizing Human-Object Interactions in Still Images by Modeling the Mutual Context of Objects and Human Poses , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2] Edwin R. Hancock,et al. Gender discriminating models from facial surface normals , 2011, Pattern Recognit..

[3] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4] Arnold W. M. Smeulders,et al. Fine-Grained Categorization by Alignments , 2013, 2013 IEEE International Conference on Computer Vision.

[5] Andrew Zisserman,et al. 2D Articulated Human Pose Estimation and Retrieval in (Almost) Unconstrained Still Images , 2012, International Journal of Computer Vision.

[6] Muhammad Ghulam,et al. Gender Recognition Using Nonsubsampled Contourlet Transform and WLD Descriptor , 2013, SCIA.

[7] Cordelia Schmid,et al. Learning Color Names for Real-World Applications , 2009, IEEE Transactions on Image Processing.

[8] Matti Pietikäinen,et al. IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 2009, TPAMI-2008-09-0620 1 WLD: A Robust Local Image Descriptor , 2022 .

[9] Leonidas J. Guibas,et al. Human action recognition by learning bases of action attributes and parts , 2011, 2011 International Conference on Computer Vision.

[10] Bernt Schiele,et al. New features and insights for pedestrian detection , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[11] Zhenhua Guo,et al. Rotation invariant texture classification using LBP variance (LBPV) with global matching , 2010, Pattern Recognit..

[12] Ivan Laptev,et al. Learning person-object interactions for action recognition in still images , 2011, NIPS.

[13] Cheng Li,et al. Pixel-Level Hand Detection in Ego-centric Videos , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[14] Jitendra Malik,et al. Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[15] Deva Ramanan,et al. Face detection, pose estimation, and landmark localization in the wild , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Paul A. Viola,et al. Robust Real-Time Face Detection , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[17] Fahad Shahbaz Khan,et al. Portmanteau Vocabularies for Multi-Cue Image Representation , 2011, NIPS.

[18] Luc Van Gool,et al. Real-time facial feature detection using conditional regression forests , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Daniel P. Huttenlocher,et al. Pictorial Structures for Object Recognition , 2004, International Journal of Computer Vision.

[20] Andrew Zisserman,et al. Representing shape with a spatial pyramid kernel , 2007, CIVR '07.

[21] Larry S. Davis,et al. Human detection using partial least squares analysis , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[22] Weishan Dong,et al. Head-shoulder based gender recognition , 2013, 2013 IEEE International Conference on Image Processing.

[23] Andrew Zisserman,et al. Progressive search space reduction for human pose estimation , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[24] Subhransu Maji,et al. Describing people: A poselet-based approach to attribute classification , 2011, 2011 International Conference on Computer Vision.

[25] Michael Felsberg,et al. Coloring Action Recognition in Still Images , 2013, International Journal of Computer Vision.

[26] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[27] Paul C. Miller,et al. Full body image feature representations for gender profiling , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[28] Shihong Lao,et al. Multiple Human Tracking Based on Multi-view Upper-Body Detection and Discriminative Learning , 2010, 2010 20th International Conference on Pattern Recognition.

[29] F. Xavier Roca,et al. On Importance of Interactions and Context in Human Action Recognition , 2011, IbPRIA.

[30] Cordelia Schmid,et al. Expanded Parts Model for Human Attribute and Action Recognition in Still Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31] Larry S. Davis,et al. Birdlets: Subordinate categorization using volumetric primitives and pose-normalized appearance , 2011, 2011 International Conference on Computer Vision.

[32] Koen E. A. van de Sande,et al. Evaluating Color Descriptors for Object and Scene Recognition , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33] Fahad Shahbaz Khan,et al. Color attributes for object detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34] Subhransu Maji,et al. Action recognition from a distributed representation of pose and appearance , 2011, CVPR 2011.

[35] Ivan Laptev,et al. Recognizing human actions in still images: a study of bag-of-features and part-based representations , 2010, BMVC.

[36] Hao Su,et al. Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[37] Forrest N. Iandola,et al. Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction , 2013, 2013 IEEE International Conference on Computer Vision.

[38] Cordelia Schmid,et al. Weakly Supervised Learning of Interactions between Humans and Objects , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[39] Larry S. Davis,et al. Observing Human-Object Interactions: Using Spatial and Functional Compatibility for Recognition , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[40] Kun Duan,et al. Discovering localized attributes for fine-grained recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[41] C. V. Jawahar,et al. Blocks That Shout: Distinctive Parts for Scene Classification , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[42] Zhenhua Guo,et al. A Completed Modeling of Local Binary Pattern Operator for Texture Classification , 2010, IEEE Transactions on Image Processing.

[43] Edwin R. Hancock,et al. Facial gender classification using shape-from-shading , 2010, Image Vis. Comput..

[44] Fahad Shahbaz Khan,et al. Discriminative compact pyramids for object and scene recognition , 2012, Pattern Recognition.

[45] Roope Raisamo,et al. Evaluation of Gender Classification Methods with Automatically Detected and Aligned Faces , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46] Fei-Fei Li,et al. Modeling mutual context of object and human pose in human-object interaction activities , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[47] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[48] Cordelia Schmid,et al. Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[49] Martin A. Fischler,et al. The Representation and Matching of Pictorial Structures , 1973, IEEE Transactions on Computers.

[50] Luís A. Alexandre. Gender recognition: A multiscale decision fusion approach , 2010, Pattern Recognit. Lett..

[51] Michael Felsberg,et al. Evaluating the Impact of Color on Texture Recognition , 2013, CAIP.

[52] Cordelia Schmid,et al. Local Features and Kernels for Classification of Texture and Object Categories: A Comprehensive Study , 2006, 2006 Conference on Computer Vision and Pattern Recognition Workshop (CVPRW'06).

[53] Cordelia Schmid,et al. Discriminative spatial saliency for image classification , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[54] Michael Felsberg,et al. Scale Coding Bag-of-Words for Action Recognition , 2014, 2014 22nd International Conference on Pattern Recognition.

[55] Andrew Zisserman,et al. Hand detection using multiple proposals , 2011, BMVC.

[56] Larry S. Davis,et al. Objects in Action: An Approach for Combining Action Understanding and Object Perception , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[57] Paul A. Viola,et al. Rapid object detection using a boosted cascade of simple features , 2001, Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition. CVPR 2001.

[58] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[59] Yi Yang,et al. Recognizing proxemics in personal photos , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[60] Caifeng Shan,et al. Learning local binary patterns for gender classification on real-world face images , 2012, Pattern Recognit. Lett..