An Attention-Based Deep Learning Model for Multiple Pedestrian Attributes Recognition

The automatic characterization of pedestrians in surveillance footage is a tough challenge, particularly when the data is extremely diverse with cluttered backgrounds, and subjects are captured from varying distances, under multiple poses, with partial occlusion. Having observed that the state-of-the-art performance is still unsatisfactory, this paper provides a novel solution to the problem, with two-fold contributions: 1) considering the strong semantic correlation between the different full-body attributes, we propose a multi-task deep model that uses an element-wise multiplication layer to extract more comprehensive feature representations. In practice, this layer serves as a filter to remove irrelevant background features, and is particularly important to handle complex, cluttered data; and 2) we introduce a weighted-sum term to the loss function that not only relativizes the contribution of each task (kind of attributed) but also is crucial for performance improvement in multiple-attribute inference settings. Our experiments were performed on two well-known datasets (RAP and PETA) and point for the superiority of the proposed method with respect to the state-of-the-art. The code is available at this https URL.

[1]  Zankhana H. Shah,et al.  Facial Expression Recognition: A Survey , 2014 .

[2]  Ezzeddine Zagrouba,et al.  Abnormal behavior recognition for intelligent video surveillance systems: A review , 2018, Expert Syst. Appl..

[3]  Jitendra Malik,et al.  Actions and Attributes from Wholes and Parts , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Yan Wang,et al.  Deep View-Sensitive Pedestrian Attribute Inference in an end-to-end Model , 2017, BMVC.

[5]  Junjie Yan,et al.  Localization Guided Learning for Pedestrian Attribute Recognition , 2018, BMVC.

[6]  Kaiqi Huang,et al.  Pose Guided Deep Model for Pedestrian Attribute Recognition in Surveillance Scenarios , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[7]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Yuan Yu,et al.  TensorFlow: A system for large-scale machine learning , 2016, OSDI.

[9]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[10]  Ping Tan,et al.  Attribute Recognition from Adaptive Parts , 2016, BMVC.

[11]  Xiaogang Wang,et al.  HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[12]  Shengcai Liao,et al.  Multi-label convolutional neural network based pedestrian attribute classification , 2017, Image Vis. Comput..

[13]  Xiao Wang,et al.  Pedestrian Attribute Recognition: A Survey , 2019, Pattern Recognit..

[14]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[15]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[16]  Sebastian Ruder,et al.  An Overview of Multi-Task Learning in Deep Neural Networks , 2017, ArXiv.

[17]  Yiqiang Chen,et al.  Pedestrian Attribute Recognition with Part-based CNN and Combined Feature Representations , 2018, VISIGRAPP.

[18]  Feng Guo,et al.  MSE-Net: Pedestrian Attribute Recognition Using MLSC and SE-Blocks , 2019, ICAIS.

[19]  Ioannis A. Kakadiaris,et al.  Deep Imbalanced Attribute Classification using Visual Attention Aggregation , 2018, ECCV.

[20]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[21]  Qiaozhe Li,et al.  Visual-Semantic Graph Reasoning for Pedestrian Attribute Recognition , 2019, AAAI.

[22]  Bastian Leibe,et al.  Person Attribute Recognition with a Jointly-Trained Holistic CNN Model , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[23]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[24]  Chen Huang,et al.  Human Attribute Recognition by Deep Hierarchical Contexts , 2016, ECCV.

[25]  Shaogang Gong,et al.  Attribute Recognition by Joint Recurrent Learning of Context and Correlation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[26]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[27]  Shaogang Gong,et al.  Attributes-Based Re-identification , 2014, Person Re-Identification.

[28]  Jian Sun,et al.  Identity Mappings in Deep Residual Networks , 2016, ECCV.

[29]  Jun Wan,et al.  Attention-Based Pedestrian Attribute Analysis , 2019, IEEE Transactions on Image Processing.

[30]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[31]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Shengcai Liao,et al.  Multi-label CNN based pedestrian attribute learning for soft biometrics , 2015, 2015 International Conference on Biometrics (ICB).

[33]  Liang Zheng,et al.  Improving Person Re-identification by Attribute and Identity Learning , 2017, Pattern Recognit..

[34]  Shengcai Liao,et al.  Pedestrian Attribute Classification in Surveillance: Database and Evaluation , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[35]  Xin Zhao,et al.  Recurrent Attention Model for Pedestrian Attribute Recognition , 2019, AAAI.

[36]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[37]  Xiaoou Tang,et al.  Pedestrian Attribute Recognition At Far Distance , 2014, ACM Multimedia.

[38]  Kaiqi Huang,et al.  Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[39]  Kaiqi Huang,et al.  A Richly Annotated Pedestrian Dataset for Person Retrieval in Real Surveillance Scenarios , 2019, IEEE Transactions on Image Processing.

[40]  Gang Wang,et al.  Multi-Task CNN Model for Attribute Prediction , 2015, IEEE Transactions on Multimedia.

[41]  Yurong Liu,et al.  A survey of deep neural network architectures and their applications , 2017, Neurocomputing.

[42]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[43]  Xiangyang Xue,et al.  Adaptively Weighted Multi-task Deep Network for Person Attribute Classification , 2017, ACM Multimedia.