论文信息 - Multi-task Curriculum Transfer Deep Learning of Clothing Attributes

Multi-task Curriculum Transfer Deep Learning of Clothing Attributes

Recognising detailed clothing characteristics (finegrained attributes) in unconstrained images of people inthe-wild is a challenging task for computer vision, especially when there is only limited training data from the wild whilst most data available for model learning are captured in well-controlled environments using fashion models (well lit, no background clutter, frontal view, high-resolution). In this work, we develop a deep learning framework capable of model transfer learning from well-controlled shop clothing images collected from web retailers to in-the-wild images from the street. Specifically, we formulate a novel Multi-Task Curriculum Transfer (MTCT) deep learning method to explore multiple sources of different types of web annotations with multi-labelled fine-grained attributes. Our multi-task loss function is designed to extract more discriminative representations in training by jointly learning all attributes, and our curriculum strategy exploits the staged easy-to-hard transfer learning motivated by cognitive studies. We demonstrate the advantages of the MTCT model over the state-of-the-art methods on the X-Domain benchmark, a large scale clothing attribute dataset. Moreover, we show that the MTCT model has a notable advantage over contemporary models when the training data size is small.

[1] Svetlana Lazebnik,et al. Where to Buy It: Matching Street Clothing Photos in Online Shops , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[2] Sharath Pankanti,et al. Attribute-based People Search: Lessons Learnt from a Practical Surveillance System , 2014, ICMR.

[3] Bernt Schiele,et al. What helps where – and why? Semantic relatedness for knowledge transfer , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4] Jason Weston,et al. Curriculum learning , 2009, ICML '09.

[5] Trevor Darrell,et al. One-Shot Adaptation of Supervised Deep Convolutional Models , 2013, ICLR.

[6] Tsuhan Chen,et al. Clothing cosegmentation for recognizing people , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7] Christoph H. Lampert,et al. Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Tong Zhang,et al. A Framework for Learning Predictive Structures from Multiple Tasks and Unlabeled Data , 2005, J. Mach. Learn. Res..

[9] Xiaoou Tang,et al. Pedestrian Attribute Recognition At Far Distance , 2014, ACM Multimedia.

[10] Tinne Tuytelaars,et al. Unsupervised Visual Domain Adaptation Using Subspace Alignment , 2013, 2013 IEEE International Conference on Computer Vision.

[11] Michael S. Bernstein,et al. ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[12] Pascal Vincent,et al. Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13] Xinlei Chen,et al. NEIL: Extracting Visual Knowledge from Web Data , 2013, 2013 IEEE International Conference on Computer Vision.

[14] Yun Fu,et al. Task-driven deep transfer learning for image classification , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15] Larry S. Davis,et al. Image ranking and retrieval based on multi-attribute queries , 2011, CVPR 2011.

[16] Ling Shao,et al. Transfer Learning for Visual Categorization: A Survey , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[17] Yoshua Bengio,et al. How transferable are features in deep neural networks? , 2014, NIPS.

[18] Alexander C. Berg,et al. Hipster Wars: Discovering Elements of Fashion Styles , 2014, ECCV.

[19] Vishal M. Patel,et al. Joint Hierarchical Domain Adaptation and Feature Learning , 2013 .

[20] Qiang Chen,et al. Cross-Domain Image Retrieval with a Dual Attribute-Aware Ranking Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[21] Massimiliano Pontil,et al. Regularized multi--task learning , 2004, KDD.

[22] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23] Jian Dong,et al. Deep domain adaptation for describing people based on fine-grained clothing attributes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Ali Farhadi,et al. Learning Everything about Anything: Webly-Supervised Visual Concept Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25] Min Xu,et al. Efficient Clothing Retrieval with Semantic-Preserving Visual Phrases , 2012, ACCV.

[26] Rama Chellappa,et al. DASH-N: Joint Hierarchical Domain Adaptation and Feature Learning , 2015, IEEE Transactions on Image Processing.

[27] Jiwen Lu,et al. Deep transfer metric learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] XiangTao,et al. Transductive Multi-View Zero-Shot Learning , 2015 .

[29] Changsheng Xu,et al. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31] Douglas L. T. Rohde,et al. Language acquisition in the absence of explicit negative evidence: how important is starting small? , 1999, Cognition.

[32] Yuan Shi,et al. Geodesic flow kernel for unsupervised domain adaptation , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[33] Jian Dong,et al. Deep Human Parsing with Active Template Regression , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[34] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[35] Kai A. Krueger,et al. Flexible shaping: How learning in small steps helps , 2009, Cognition.

[36] Shaogang Gong,et al. Person Re-identification by Attributes , 2012, BMVC.

[37] Du-Sik Park,et al. Rotating your face using multi-task deep neural network , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38] Xiaoou Tang,et al. Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[39] Qiang Chen,et al. Network In Network , 2013, ICLR.

[40] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[41] Yoshua Bengio,et al. Deep Learning of Representations for Unsupervised and Transfer Learning , 2011, ICML Unsupervised and Transfer Learning.

[42] Huizhong Chen,et al. Describing Clothing by Semantic Attributes , 2012, ECCV.

[43] Luis E. Ortiz,et al. Parsing clothing in fashion photographs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[44] Hanqing Lu,et al. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[45] Razvan Pascanu,et al. Deep Learners Benefit More from Out-of-Distribution Examples , 2011, AISTATS.

[46] Trevor Darrell,et al. Simultaneous Deep Transfer Across Domains and Tasks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[47] Kilian Q. Weinberger,et al. Stochastic triplet embedding , 2012, 2012 IEEE International Workshop on Machine Learning for Signal Processing.

[48] Xiaogang Wang,et al. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49] Rogério Schmidt Feris,et al. Attribute-based people search in surveillance environments , 2009, 2009 Workshop on Applications of Computer Vision (WACV).

[50] Rama Chellappa,et al. Unsupervised Adaptation Across Domain Shifts by Generating Intermediate Data Representations , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[51] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[52] Shree K. Nayar,et al. Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[53] Tong Zhang,et al. Clothes search in consumer photos via color matching and attribute learning , 2011, ACM Multimedia.

[54] J. Elman. Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[55] Adriana Kovashka,et al. WhittleSearch: Image search with relative attribute feedback , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[56] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[57] Alexei A. Efros,et al. Unbiased look at dataset bias , 2011, CVPR 2011.

[58] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[59] Luc Van Gool,et al. Apparel Classification with Style , 2012, ACCV.

[60] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.