Hierarchical pedestrian attribute recognition based on adaptive region localization

Learning to recognize pedestrian attributes (such as gender, hair style, take hat or not) in video surveillance scenarios is critical to a variety of tasks, such as crime prevention and border control. However, it is still challenging due to low resolution and highlight influence in the actual surveillance scenarios, in which traditional methods work not well. This paper aims at proposing a robust pedestrian attribute recognition framework which can be adaptive to the actual surveillance scenarios. Specifically, we first propose a hierarchical recognition strategy by heuristically classifying the pedestrian attributes as global ones (such as gender and age) and local ones (such as hair style and has glass). Then the whole region is used for the global attribute recognition, and the relevant regions are used for the local attribute recognition. To estimate the relevant regions above, we further propose an adaptive region localization scheme, including position estimation based on geometric human body and relevant region localization based on random expansion. Finally, experimental results on representative datasets and our actual surveillance scenarios both demonstrate the effectiveness of the proposed method.

[1]  Paul A. Viola,et al.  A unified learning framework for real time face detection and classification , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[2]  Yongdong Zhang,et al.  Graph-based multi-space semantic correlation propagation for video retrieval , 2010, The Visual Computer.

[3]  Shaogang Gong,et al.  Person Re-identification by Attributes , 2012, BMVC.

[4]  Jean-Luc Dugelay,et al.  Search pruning in video surveillance systems: Efficiency-reliability tradeoff , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[5]  Rong Zheng,et al.  Multiple style exploration for story unit segmentation of broadcast news video , 2013, Multimedia Systems.

[6]  Shengcai Liao,et al.  Multi-label CNN based pedestrian attribute learning for soft biometrics , 2015, 2015 International Conference on Biometrics (ICB).

[7]  Yongdong Zhang,et al.  Multi-task deep visual-semantic embedding for video thumbnail selection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Shengcai Liao,et al.  Pedestrian Attribute Classification in Surveillance: Database and Evaluation , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[9]  Tao Mei,et al.  A Deep Learning-Based Approach to Progressive Vehicle Re-identification for Urban Surveillance , 2016, ECCV.

[10]  Tao Xiang,et al.  Video Analytics for Business Intelligence , 2012, Studies in Computational Intelligence.

[11]  Trevor Darrell,et al.  PANDA: Pose Aligned Networks for Deep Attribute Modeling , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Xiao Liu,et al.  Attribute-restricted latent topic model for person re-identification , 2012, Pattern Recognit..

[15]  Xiaoou Tang,et al.  Pedestrian Attribute Recognition At Far Distance , 2014, ACM Multimedia.

[16]  Kaiqi Huang,et al.  Multi-attribute learning for pedestrian attribute recognition in surveillance scenarios , 2015, 2015 3rd IAPR Asian Conference on Pattern Recognition (ACPR).

[17]  Sheng Tang,et al.  Accurate Estimation of Human Body Orientation From RGB-D Sensors , 2013, IEEE Transactions on Cybernetics.

[18]  Yongdong Zhang,et al.  Motion region-based trajectory analysis and re-ranking for video retrieval , 2009, 2009 IEEE International Conference on Multimedia and Expo.

[19]  Bastian Leibe,et al.  Person Attribute Recognition with a Jointly-Trained Holistic CNN Model , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[20]  Subhransu Maji,et al.  Describing people: A poselet-based approach to attribute classification , 2011, 2011 International Conference on Computer Vision.

[21]  Ming-Hsuan Yang,et al.  Learning Gender with Support Faces , 2002, IEEE Trans. Pattern Anal. Mach. Intell..