Spatial and Semantic Relations for Pedestrian Attribute Recognition

This paper addresses the problem of pedestrian attribute recognition. Previous works typically treat the different attributes independently with each other, without considering possible dependencies between them, or just take semantic relations or spatial relations into consideration. In our work, we propose an end-to-end learning framework combining with a subnet for multi-task classification to take both spatial and semantic relations into account, which proves to be more accurate and effective. Our work can not only deal with binary attributes, but also multi-class attributes in a single network, overcoming the drawbacks in many methods that can only recognize the binary attributes in joint-training. Experiments have been carried out on several benchmarks and positively demonstrated the superiority of our method.

[1]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Kaiqi Huang,et al.  A Richly Annotated Dataset for Pedestrian Attribute Recognition , 2016, ArXiv.

[3]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[4]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Alexander J. Smola,et al.  Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Liang Zheng,et al.  Improving Person Re-identification by Attribute and Identity Learning , 2017, Pattern Recognit..

[7]  Jitendra Malik,et al.  Actions and Attributes from Wholes and Parts , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[8]  Chen Huang,et al.  Human Attribute Recognition by Deep Hierarchical Contexts , 2016, ECCV.

[9]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[10]  Qi Tian,et al.  Scalable Person Re-identification: A Benchmark , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Yan Wang,et al.  Deep View-Sensitive Pedestrian Attribute Inference in an end-to-end Model , 2017, BMVC.

[12]  Shaogang Gong,et al.  Person Re-identification by Attributes , 2012, BMVC.

[13]  Gang Sun,et al.  Squeeze-and-Excitation Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14]  Shengcai Liao,et al.  Multi-label convolutional neural network based pedestrian attribute classification , 2017, Image Vis. Comput..

[15]  Nenghai Yu,et al.  Learning Spatial Regularization with Image-Level Supervisions for Multi-label Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Sharath Pankanti,et al.  Attribute-based People Search: Lessons Learnt from a Practical Surveillance System , 2014, ICMR.

[17]  Kaiqi Huang,et al.  Weakly-supervised Learning of Mid-level Features for Pedestrian Attribute Recognition and Localization , 2016, BMVC.

[18]  Xiaoou Tang,et al.  Pedestrian Attribute Recognition At Far Distance , 2014, ACM Multimedia.

[19]  Kaiqi Huang,et al.  Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[20]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).