论文信息 - Two-Stream Part-Based Deep Representation for Human Attribute Recognition

Two-Stream Part-Based Deep Representation for Human Attribute Recognition

Recognizing human attributes in unconstrained environments is a challenging computer vision problem. State-of-the-art approaches to human attribute recognition are based on convolutional neural networks (CNNs). The de facto practice when training these CNNs on a large labeled image dataset is to take RGB pixel values of an image as input to the network. In this work, we propose a two-stream part-based deep representation for human attribute classification. Besides the standard RGB stream, we train a deep network by using mapped coded images with explicit texture information, that complements the standard RGB deep model. To integrate human body parts knowledge, we employ the deformable part-based models together with our two-stream deep model. Experiments are performed on the challenging Human Attributes (HAT-27) Dataset consisting of 27 different human attributes. Our results clearly show that (a) the two-stream deep network provides consistent gain in performance over the standard RGB model and (b) that the attribute classification results are further improved with our two-stream part-based deep representations, leading to state-of-the-art results.

[1] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3] Ivan Laptev,et al. Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4] Matti Pietikäinen,et al. Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5] Hao Guo,et al. Human attribute recognition by refining attention heat map , 2017, Pattern Recognit. Lett..

[6] Iasonas Kokkinos,et al. Deep Filter Banks for Texture Recognition, Description, and Segmentation , 2015, International Journal of Computer Vision.

[7] Thomas Mensink,et al. Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[8] Andrew Zisserman,et al. Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[9] Jitendra Malik,et al. Actions and Attributes from Wholes and Parts , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[10] Svetlana Lazebnik,et al. Scene recognition and weakly supervised object localization with deformable part-based models , 2011, 2011 International Conference on Computer Vision.

[11] Joost van de Weijer,et al. Deep Semantic Pyramids for Human Attributes and Action Recognition , 2015, SCIA.

[12] Shengcai Liao,et al. Multi-label CNN based pedestrian attribute learning for soft biometrics , 2015, 2015 International Conference on Biometrics (ICB).

[13] Andrew Zisserman,et al. Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[14] Cordelia Schmid,et al. Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15] Cordelia Schmid,et al. Expanded Parts Model for Human Attribute and Action Recognition in Still Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Michael Felsberg,et al. Semantic Pyramids for Gender and Action Recognition , 2014, IEEE Transactions on Image Processing.

[17] Trevor Darrell,et al. PANDA: Pose Aligned Networks for Deep Attribute Modeling , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18] Jeremy S. Smith,et al. Action Recognition from Still Images Based on Deep VLAD Spatial Pyramids , 2017, Signal Process. Image Commun..

[19] Michael Felsberg,et al. Scale coding bag of deep features for human attribute and action recognition , 2016, Machine Vision and Applications.

[20] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] Matti Pietikäinen,et al. Evaluation of LBP and Deep Texture Descriptors with a New Robustness Benchmark , 2016, ECCV.

[22] B. K. Julsing,et al. Face Recognition with Local Binary Patterns , 2012 .

[23] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[24] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[25] Gaurav Sharma,et al. Learning discriminative spatial representation for image classification , 2011, BMVC.

[26] Jitendra Malik,et al. Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[27] Cordelia Schmid,et al. Expanded Parts Model for Semantic Description of Humans in Still Images , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28] Michael Felsberg,et al. Compact color-texture description for texture classification , 2015, Pattern Recognit. Lett..

[29] Song-Chun Zhu,et al. Human Attribute Recognition by Rich Appearance Dictionary , 2013, 2013 IEEE International Conference on Computer Vision.

[30] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[31] Fahad Shahbaz Khan,et al. Combining Holistic and Part-based Deep Representations for Computational Painting Categorization , 2016, ICMR.

[32] Tal Hassner,et al. Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns , 2015, ICMI.

[33] Xiaoou Tang,et al. Pedestrian Attribute Recognition At Far Distance , 2014, ACM Multimedia.

[34] Fahad Shahbaz Khan,et al. Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition and Remote Sensing Scene Classification , 2017, ArXiv.

[35] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[36] Anton van den Hengel,et al. The treasure beneath convolutional layers: Cross-convolutional-layer pooling for image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37] Andrea Vedaldi,et al. MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[38] Forrest N. Iandola,et al. Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction , 2013, 2013 IEEE International Conference on Computer Vision.

[39] Fahad Shahbaz Khan,et al. TEX-Nets: Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition , 2017, ICMR.