论文信息 - Human Attribute Recognition by Deep Hierarchical Contexts

Human Attribute Recognition by Deep Hierarchical Contexts

We present an approach for recognizing human attributes in unconstrained settings. We train a Convolutional Neural Network (CNN) to select the most attribute-descriptive human parts from all poselet detections, and combine them with the whole body as a pose-normalized deep representation. We further improve by using deep hierarchical contexts ranging from human-centric level to scene level. Human-centric context captures human relations, which we compute from the nearest neighbor parts of other people on a pyramid of CNN feature maps. The matched parts are then average pooled and they act as a similarity regularization. To utilize the scene context, we re-score human-centric predictions by the global scene classification score jointly learned in our CNN, yielding final scene-aware predictions. To facilitate our study, a large-scale WIDER Attribute dataset(Dataset URL: http://mmlab.ie.cuhk.edu.hk/projects/WIDERAttribute) is introduced with human attribute and image event annotations, and our method surpasses competitive baselines on this dataset and other popular ones.

[1] Ali Farhadi,et al. Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[2] Pietro Perona,et al. Fine-grained classification of pedestrians in video: Benchmark and state of the art , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Sanja Fidler,et al. The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4] Paul A. Viola,et al. A unified learning framework for real time face detection and classification , 2002, Proceedings of Fifth IEEE International Conference on Automatic Face Gesture Recognition.

[5] Chen Huang,et al. Unsupervised Learning of Discriminative Attributes and Visual Representations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[7] Larry S. Davis,et al. Multi-Task Learning with Low Rank Attribute Embedding for Person Re-Identification , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[8] Eli Shechtman,et al. In defense of Nearest-Neighbor based image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[9] Pietro Perona,et al. Improved Bird Species Recognition Using Pose Normalized Deep Convolutional Nets , 2014, BMVC.

[10] Bastian Leibe,et al. Person Attribute Recognition with a Jointly-Trained Holistic CNN Model , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[11] Jitendra Malik,et al. Actions and Attributes from Wholes and Parts , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[12] Antonio Torralba,et al. Exploiting hierarchical context on a large database of object categories , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13] Shaogang Gong,et al. Person Re-identification by Attributes , 2012, BMVC.

[14] Antonio Torralba,et al. Object Recognition by Scene Alignment , 2007, NIPS.

[15] Christoph H. Lampert,et al. Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16] Koen E. A. van de Sande,et al. Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[17] Song-Chun Zhu,et al. Human Attribute Recognition by Rich Appearance Dictionary , 2013, 2013 IEEE International Conference on Computer Vision.

[18] Subhransu Maji,et al. Detecting People Using Mutually Consistent Poselet Activations , 2010, ECCV.

[19] Dahua Lin,et al. Recognize complex events from static images by fusing deep channels , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Cordelia Schmid,et al. Expanded Parts Model for Semantic Description of Humans in Still Images , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22] Trevor Darrell,et al. PANDA: Pose Aligned Networks for Deep Attribute Modeling , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[23] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[24] Shree K. Nayar,et al. FaceTracer: A Search Engine for Large Collections of Images with Faces , 2008, ECCV.

[25] Chunxiao Liu,et al. On-the-fly feature importance mining for person re-identification , 2014, Pattern Recognit..

[26] Shree K. Nayar,et al. Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[27] Pietro Perona,et al. Bird Species Categorization Using Pose Normalized Deep Convolutional Nets , 2014, ArXiv.

[28] Trevor Darrell,et al. Pose pooling kernels for sub-category recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29] Shaogang Gong,et al. Person Re-Identification , 2014 .

[30] Forrest N. Iandola,et al. Deformable Part Descriptors for Fine-Grained Recognition and Attribute Prediction , 2013, 2013 IEEE International Conference on Computer Vision.

[31] Antonio Torralba,et al. Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[32] Ming-Hsuan Yang,et al. Learning Gender with Support Faces , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[33] A. Torralba,et al. The role of context in object recognition , 2007, Trends in Cognitive Sciences.

[34] David G. Lowe,et al. Local Naive Bayes Nearest Neighbor for image classification , 2011, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[35] Xiaoou Tang,et al. Pedestrian Attribute Recognition At Far Distance , 2014, ACM Multimedia.

[36] Jitendra Malik,et al. Contextual Action Recognition with R*CNN , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[37] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[38] Tsuhan Chen,et al. Extracting adaptive contextual cues from unlabeled regions , 2011, 2011 International Conference on Computer Vision.

[39] Subhransu Maji,et al. Describing people: A poselet-based approach to attribute classification , 2011, 2011 International Conference on Computer Vision.

[40] Gaurav Sharma,et al. Learning discriminative spatial representation for image classification , 2011, BMVC.

[41] Jitendra Malik,et al. Poselets: Body part detectors trained using 3D human pose annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[42] Lamberto Ballan,et al. Love Thy Neighbors: Image Annotation by Exploiting Image Metadata , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[43] Trevor Darrell,et al. Part-Based R-CNNs for Fine-Grained Category Detection , 2014, ECCV.