Deep domain adaptation for describing people based on fine-grained clothing attributes

We address the problem of describing people based on fine-grained clothing attributes. This is an important problem for many practical applications, such as identifying target suspects or finding missing people based on detailed clothing descriptions in surveillance videos or consumer photos. We approach this problem by first mining clothing images with fine-grained attribute labels from online shopping stores. A large-scale dataset is built with about one million images and fine-detailed attribute sub-categories, such as various shades of color (e.g., watermelon red, rosy red, purplish red), clothing types (e.g., down jacket, denim jacket), and patterns (e.g., thin horizontal stripes, houndstooth). As these images are taken in ideal pose/lighting/background conditions, it is unreliable to directly use them as training data for attribute prediction in the domain of unconstrained images captured, for example, by mobile phones or surveillance cameras. In order to bridge this gap, we propose a novel double-path deep domain adaptation network to model the data from the two domains jointly. Several alignment cost layers placed inbetween the two columns ensure the consistency of the two domain features and the feasibility to predict unseen attribute categories in one of the domains. Finally, to achieve a working system with automatic human body alignment, we trained an enhanced RCNN-based detector to localize human bodies in images. Our extensive experimental evaluation demonstrates the effectiveness of the proposed approach for describing people based on fine-grained clothing attributes.

[1]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Marwan Mattar,et al.  Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[3]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[5]  Rogério Schmidt Feris,et al.  Attribute-based people search in surveillance environments , 2009, 2009 Workshop on Applications of Computer Vision (WACV).

[6]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[9]  Bernt Schiele,et al.  What helps where – and why? Semantic relatedness for knowledge transfer , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Pietro Perona,et al.  Visual Recognition with Humans in the Loop , 2010, ECCV.

[11]  Yejin Choi,et al.  Baby talk: Understanding and generating simple image descriptions , 2011, CVPR 2011.

[12]  Kristen Grauman,et al.  Relative attributes , 2011, 2011 International Conference on Computer Vision.

[13]  Yoshua Bengio,et al.  Deep Sparse Rectifier Neural Networks , 2011, AISTATS.

[14]  Subhransu Maji,et al.  Describing people: A poselet-based approach to attribute classification , 2011, 2011 International Conference on Computer Vision.

[15]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[16]  Larry S. Davis,et al.  Image ranking and retrieval based on multi-attribute queries , 2011, CVPR 2011.

[17]  Nitish Srivastava,et al.  Improving neural networks by preventing co-adaptation of feature detectors , 2012, ArXiv.

[18]  Luis E. Ortiz,et al.  Parsing clothing in fashion photographs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Changsheng Xu,et al.  Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Adriana Kovashka,et al.  WhittleSearch: Image search with relative attribute feedback , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Yuan Shi,et al.  Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Shaogang Gong,et al.  Person Re-identification by Attributes , 2012, BMVC.

[23]  Boqing Gong,et al.  Overcoming Dataset Bias : An Unsupervised Domain Adaptation Approach , 2012 .

[24]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[25]  Huizhong Chen,et al.  Describing Clothing by Semantic Attributes , 2012, ECCV.

[26]  Hanqing Lu,et al.  Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Tinne Tuytelaars,et al.  Unsupervised Visual Domain Adaptation Using Subspace Alignment , 2013, 2013 IEEE International Conference on Computer Vision.

[28]  Xinlei Chen,et al.  NEIL: Extracting Visual Knowledge from Web Data , 2013, 2013 IEEE International Conference on Computer Vision.

[29]  Xiaogang Wang,et al.  A Deep Sum-Product Architecture for Robust Facial Attributes Analysis , 2013, 2013 IEEE International Conference on Computer Vision.

[30]  Sumit Chopra,et al.  DLID: Deep Learning for Domain Adaptation by Interpolating between Domains , 2013 .

[31]  Vishal M. Patel,et al.  Joint Hierarchical Domain Adaptation and Feature Learning , 2013 .

[32]  Shengcai Liao,et al.  Pedestrian Attribute Classification in Surveillance: Database and Evaluation , 2013, 2013 IEEE International Conference on Computer Vision Workshops.

[33]  Yoshua Bengio,et al.  Maxout Networks , 2013, ICML.

[34]  Iasonas Kokkinos,et al.  Understanding Objects in Detail with Fine-Grained Attributes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  Jian Dong,et al.  Towards Unified Human Parsing and Pose Estimation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[38]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[39]  Trevor Darrell,et al.  One-Shot Adaptation of Supervised Deep Convolutional Models , 2013, ICLR.

[40]  Shuicheng Yan,et al.  CNN: Single-label to Multi-label , 2014, ArXiv.

[41]  Trevor Darrell,et al.  PANDA: Pose Aligned Networks for Deep Attribute Modeling , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Alexander C. Berg,et al.  Hipster Wars: Discovering Elements of Fashion Styles , 2014, ECCV.

[43]  Rama Chellappa,et al.  Unsupervised Adaptation Across Domain Shifts by Generating Intermediate Data Representations , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Ali Farhadi,et al.  Learning Everything about Anything: Webly-Supervised Visual Concept Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[45]  Sharath Pankanti,et al.  Attribute-based People Search: Lessons Learnt from a Practical Surveillance System , 2014, ICMR.

[46]  Ivan Laptev,et al.  Learning and Transferring Mid-level Image Representations Using Convolutional Neural Networks , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[47]  Shuicheng Yan,et al.  Fashion Analysis: Current Techniques and Future Directions , 2014, IEEE MultiMedia.

[48]  Rama Chellappa,et al.  DASH-N: Joint Hierarchical Domain Adaptation and Feature Learning , 2015, IEEE Transactions on Image Processing.

[49]  Bingbing Ni,et al.  HCP: A Flexible CNN Framework for Multi-Label Image Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  K. Ravi Kumar,et al.  Soft Biometrics for Subject Identification Using Clothing Attributes , 2017 .