The Deeper, the Better: Analysis of Person Attributes Recognition

Research into person attributes recognition has focused on approaches to describe a person in terms of their appearance. Typically, this includes a wide range of traits including age, gender, clothing, and footwear. Although this could be used in a wide variety of scenarios, it generally is applied to video surveillance, where attribute recognition is impacted by low resolution, and other issues such as variable pose, occlusion and shadow. Recent approaches have used deep convolutional neural networks (CNNs) to improve the accuracy in person attribute recognition. However, many of these networks are relatively shallow and it is unclear to what extent they use contextual cues to improve classification accuracy. This paper builds upon prior research by proposing to use a modified ResNet architecture with calibrations that permit us to train networks that are deeper than previously published approaches. Interpretation suggests that this deeper architectures allows the network to take more contextual information into consideration, which helps to improve classification accuracy and generalizability. We present experimental analysis and results for whole body attributes using the PA-100K and PETA datasets and facial attributes using the CelebA dataset.

[1]  Jean-Luc Dugelay,et al.  Learned vs. Hand-Crafted Features for Pedestrian Gender Recognition , 2015, ACM Multimedia.

[2]  Shengcai Liao,et al.  Multi-label CNN based pedestrian attribute learning for soft biometrics , 2015, 2015 International Conference on Biometrics (ICB).

[3]  Kaiqi Huang,et al.  Weakly-supervised Learning of Mid-level Features for Pedestrian Attribute Recognition and Localization , 2016, BMVC.

[4]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Trevor Darrell,et al.  PANDA: Pose Aligned Networks for Deep Attribute Modeling , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Xiaoou Tang,et al.  Learning to Recognize Pedestrian Attribute , 2015, ArXiv.

[7]  J. Gregory Trafton,et al.  Identifying people with soft-biometrics at Fleet Week , 2013, 2013 8th ACM/IEEE International Conference on Human-Robot Interaction (HRI).

[8]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[9]  Xiaoou Tang,et al.  Pedestrian Attribute Recognition At Far Distance , 2014, ACM Multimedia.

[10]  Kaiqi Huang,et al.  Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[11]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[13]  Rainer Stiefelhagen,et al.  Person Re-identification by Deep Learning Attribute-Complementary Information , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Daniel Martinho-Corbishley,et al.  Super-Fine Attributes with Crowd Prototyping , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Tao Xiang,et al.  Transferring a semantic representation for person re-identification and search , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Xiaogang Wang,et al.  HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[17]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Esube Bekele,et al.  Multi-attribute Residual Network (MAResNet) for Soft-Biometrics Recognition in Surveillance Scenarios , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[19]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[21]  Yan Wang,et al.  Deep View-Sensitive Pedestrian Attribute Inference in an end-to-end Model , 2017, BMVC.

[22]  Bastian Leibe,et al.  Person Attribute Recognition with a Jointly-Trained Holistic CNN Model , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[23]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.