Rethinking of Pedestrian Attribute Recognition: Realistic Datasets with Efficient Method

Despite various methods are proposed to make progress in pedestrian attribute recognition, a crucial problem on existing datasets is often neglected, namely, a large number of identical pedestrian identities in train and test set, which is not consistent with practical application. Thus, images of the same pedestrian identity in train set and test set are extremely similar, leading to overestimated performance of state-of-the-art methods on existing datasets. To address this problem, we propose two realistic datasets PETA\textsubscript{$zs$} and RAPv2\textsubscript{$zs$} following zero-shot setting of pedestrian identities based on PETA and RAPv2 datasets. Furthermore, compared to our strong baseline method, we have observed that recent state-of-the-art methods can not make performance improvement on PETA, RAPv2, PETA\textsubscript{$zs$} and RAPv2\textsubscript{$zs$}. Thus, through solving the inherent attribute imbalance in pedestrian attribute recognition, an efficient method is proposed to further improve the performance. Experiments on existing and proposed datasets verify the superiority of our method by achieving state-of-the-art performance.

[1]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[2]  Kaiqi Huang,et al.  Towards Rich Feature Discovery With Class Activation Maps Augmentation for Person Re-Identification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Xiaoou Tang,et al.  Pedestrian Attribute Recognition At Far Distance , 2014, ACM Multimedia.

[4]  Kaiqi Huang,et al.  Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[5]  Junjie Yan,et al.  Localization Guided Learning for Pedestrian Attribute Recognition , 2018, BMVC.

[6]  Yang Song,et al.  Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Liang Zheng,et al.  Improving Person Re-identification by Attribute and Identity Learning , 2017, Pattern Recognit..

[8]  Xiaogang Wang,et al.  HydraPlus-Net: Attentive Deep Features for Pedestrian Analysis , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[9]  Kaiqi Huang,et al.  A Richly Annotated Pedestrian Dataset for Person Retrieval in Real Surveillance Scenarios , 2019, IEEE Transactions on Image Processing.

[10]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Sharath Pankanti,et al.  Attribute-based People Search: Lessons Learnt from a Practical Surveillance System , 2014, ICMR.

[13]  Lu Sheng,et al.  Improving Pedestrian Attribute Recognition With Weakly-Supervised Multi-Scale Attribute-Specific Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[14]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[15]  Zachary C. Lipton,et al.  What is the Effect of Importance Weighting in Deep Learning? , 2018, ICML.

[16]  Colin Wei,et al.  Learning Imbalanced Datasets with Label-Distribution-Aware Margin Loss , 2019, NeurIPS.

[17]  Shaogang Gong,et al.  Attribute Recognition by Joint Recurrent Learning of Context and Correlation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[18]  Kaiming He,et al.  Feature Pyramid Networks for Object Detection , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[20]  Stefanos Zafeiriou,et al.  ArcFace: Additive Angular Margin Loss for Deep Face Recognition , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Shengcai Liao,et al.  Pedestrian Attribute Classification in Surveillance: Database and Evaluation , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[22]  Hao Guo,et al.  Visual Attention Consistency Under Image Transforms for Multi-Label Image Classification , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Kaiqi Huang,et al.  Weakly-supervised Learning of Mid-level Features for Pedestrian Attribute Recognition and Localization , 2016, BMVC.

[24]  Yang Zou,et al.  Domain Adaptation for Semantic Segmentation via Class-Balanced Self-Training , 2018, ArXiv.

[25]  Jian Cheng,et al.  NormFace: L2 Hypersphere Embedding for Face Verification , 2017, ACM Multimedia.

[26]  Ioannis A. Kakadiaris,et al.  Deep Imbalanced Attribute Classification using Visual Attention Aggregation , 2018, ECCV.

[27]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[28]  Xiu-Shen Wei,et al.  BBN: Bilateral-Branch Network With Cumulative Learning for Long-Tailed Visual Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Atsuto Maki,et al.  A systematic study of the class imbalance problem in convolutional neural networks , 2017, Neural Networks.

[30]  Chen Huang,et al.  Learning Deep Representation for Imbalanced Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Kaiqi Huang,et al.  Pose Guided Deep Model for Pedestrian Attribute Recognition in Surveillance Scenarios , 2018, 2018 IEEE International Conference on Multimedia and Expo (ICME).

[33]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[34]  Kaiqi Huang,et al.  A Richly Annotated Dataset for Pedestrian Attribute Recognition , 2016, ArXiv.

[35]  Andrew Zisserman,et al.  Spatial Transformer Networks , 2015, NIPS.

[36]  Bhiksha Raj,et al.  SphereFace: Deep Hypersphere Embedding for Face Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).