Two-Stream Part-Based Deep Representation for Human Attribute Recognition

Recognizing human attributes in unconstrained environments is a challenging computer vision problem. State-of-the-art approaches to human attribute recognition are based on convolutional neural networks (CNNs). The de facto practice when training these CNNs on a large labeled image dataset is to take RGB pixel values of an image as input to the network. In this work, we propose a two-stream part-based deep representation for human attribute classification. Besides the standard RGB stream, we train a deep network by using mapped coded images with explicit texture information, that complements the standard RGB deep model. To integrate human body parts knowledge, we employ the deformable part-based models together with our two-stream deep model. Experiments are performed on the challenging Human Attributes (HAT-27) Dataset consisting of 27 different human attributes. Our results clearly show that (a) the two-stream deep network provides consistent gain in performance over the standard RGB model and (b) that the attribute classification results are further improved with our two-stream part-based deep representations, leading to state-of-the-art results.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[3]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Matti Pietikäinen,et al.  Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5]  Hao Guo,et al.  Human attribute recognition by refining attention heat map , 2017, Pattern Recognit. Lett..

[6]  Iasonas Kokkinos,et al.  Deep Filter Banks for Texture Recognition, Description, and Segmentation , 2015, International Journal of Computer Vision.

[7]  Thomas Mensink,et al.  Image Classification with the Fisher Vector: Theory and Practice , 2013, International Journal of Computer Vision.

[8]  Andrew Zisserman,et al.  Two-Stream Convolutional Networks for Action Recognition in Videos , 2014, NIPS.

[9]  Jitendra Malik,et al.  Actions and Attributes from Wholes and Parts , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Svetlana Lazebnik,et al.  Scene recognition and weakly supervised object localization with deformable part-based models , 2011, 2011 International Conference on Computer Vision.

[11]  Joost van de Weijer,et al.  Deep Semantic Pyramids for Human Attributes and Action Recognition , 2015, SCIA.

[12]  Shengcai Liao,et al.  Multi-label CNN based pedestrian attribute learning for soft biometrics , 2015, 2015 International Conference on Biometrics (ICB).

[13]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[14]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Cordelia Schmid,et al.  Expanded Parts Model for Human Attribute and Action Recognition in Still Images , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Michael Felsberg,et al.  Semantic Pyramids for Gender and Action Recognition , 2014, IEEE Transactions on Image Processing.

[17]  Trevor Darrell,et al.  PANDA: Pose Aligned Networks for Deep Attribute Modeling , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Jeremy S. Smith,et al.  Action Recognition from Still Images Based on Deep VLAD Spatial Pyramids , 2017, Signal Process. Image Commun..

[19]  Michael Felsberg,et al.  Scale coding bag of deep features for human attribute and action recognition , 2016, Machine Vision and Applications.

[20]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Matti Pietikäinen,et al.  Evaluation of LBP and Deep Texture Descriptors with a New Robustness Benchmark , 2016, ECCV.

[22]  B. K. Julsing,et al.  Face Recognition with Local Binary Patterns , 2012 .

[23]  Rob Fergus,et al.  Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[24]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[25]  Gaurav Sharma,et al.  Learning discriminative spatial representation for image classification , 2011, BMVC.

[26]  Jitendra Malik,et al.  Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[27]  Cordelia Schmid,et al.  Expanded Parts Model for Semantic Description of Humans in Still Images , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Michael Felsberg,et al.  Compact color-texture description for texture classification , 2015, Pattern Recognit. Lett..

[29]  Song-Chun Zhu,et al.  Human Attribute Recognition by Rich Appearance Dictionary , 2013, 2013 IEEE International Conference on Computer Vision.

[30]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[31]  Fahad Shahbaz Khan,et al.  Combining Holistic and Part-based Deep Representations for Computational Painting Categorization , 2016, ICMR.

[32]  Tal Hassner,et al.  Emotion Recognition in the Wild via Convolutional Neural Networks and Mapped Binary Patterns , 2015, ICMI.

[33]  Xiaoou Tang,et al.  Pedestrian Attribute Recognition At Far Distance , 2014, ACM Multimedia.

[34]  Fahad Shahbaz Khan,et al.  Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition and Remote Sensing Scene Classification , 2017, ArXiv.

[35]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[36]  Anton van den Hengel,et al.  The treasure beneath convolutional layers: Cross-convolutional-layer pooling for image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[38]  Forrest N. Iandola,et al.  Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction , 2013, 2013 IEEE International Conference on Computer Vision.

[39]  Fahad Shahbaz Khan,et al.  TEX-Nets: Binary Patterns Encoded Convolutional Neural Networks for Texture Recognition , 2017, ICMR.