Deep embedding of concept ontology for hierarchical fashion recognition

Abstract The natural concept ontology structure of clothes has enabled easy management of large quantities of fashion images for online retailers and it is meaningful to study how to automatically recognize fashion images for both commercial promotion and academic research. In this paper, a new hierarchical approach is developed for large-scale fashion recognition. We first embed concept ontology into deep convolutional neural network (CNN) by adopting multiple deep CNN branches to learn node-specific features and classifiers explicitly. Then, we introduce a hierarchical knowledge distillation method to further improve the performance of fashion recognition. Finally, we employ the proposed approach for fashion recommendation. To deal with hierarchical deep learning constrains, we leverage back propagation to simultaneously refine the shared deep CNNs and the diverse CNN branches for relevant node features and classifiers by using our joint objective function. The main advantages of this paper lie in (1) providing an effective way for recognizing rich semantic explanations of fashion images without training large or multiple networks, and (2) saving the storage&time costs by learning personalized features and classifiers for each tree node. The experimental results on both our organized fashion dataset and the public DeepFashion dataset have verified the effectiveness and efficiency of the proposed approach on both hierarchical fashion recognition and within category fine-grained fashion recommendation.

[1]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[2]  Z. Jane Wang,et al.  DT-LET: Deep Transfer Learning by Exploring where to Transfer , 2018, Neurocomputing.

[3]  Anastasios Tefas,et al.  Deep convolutional image retrieval: A general framework , 2018, Signal Process. Image Commun..

[4]  Wei-Ta Chu,et al.  Image Style Classification Based on Learnt Deep Correlation Features , 2018, IEEE Transactions on Multimedia.

[5]  Robinson Piramuthu,et al.  Style Finder: Fine-Grained Clothing Style Detection and Retrieval , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[6]  Jianping Fan,et al.  HD-MTL: Hierarchical Deep Multi-Task Learning for Large-Scale Visual Recognition , 2017, IEEE Transactions on Image Processing.

[7]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Dong Xu,et al.  Advanced Deep-Learning Techniques for Salient and Category-Specific Object Detection: A Survey , 2018, IEEE Signal Processing Magazine.

[9]  Shuicheng Yan,et al.  Clothes Co-Parsing Via Joint Image Segmentation and Labeling With Application to Clothing Retrieval , 2016, IEEE Transactions on Multimedia.

[10]  Zhou Yu,et al.  Ontology-Driven Hierarchical Deep Learning for Fashion Recognition , 2018, 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR).

[11]  Alexander C. Berg,et al.  Hipster Wars: Discovering Elements of Fashion Styles , 2014, ECCV.

[12]  Deyu Meng,et al.  Leveraging Prior-Knowledge for Weakly Supervised Object Detection Under a Collaborative Self-Paced Curriculum Learning Framework , 2018, International Journal of Computer Vision.

[13]  Xuelong Li,et al.  Block-Row Sparse Multiview Multilabel Learning for Image Classification , 2016, IEEE Transactions on Cybernetics.

[14]  Samy Bengio,et al.  Large-Scale Object Classification Using Label Relation Graphs , 2014, ECCV.

[15]  Huizhong Chen,et al.  Describing Clothing by Semantic Attributes , 2012, ECCV.

[16]  Trevor Darrell,et al.  PANDA: Pose Aligned Networks for Deep Attribute Modeling , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Robinson Piramuthu,et al.  HD-CNN: Hierarchical Deep Convolutional Neural Network for Image Classification , 2014, ArXiv.

[18]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Jianping Fan,et al.  Integrating multi-level deep learning and concept ontology for large-scale visual recognition , 2018, Pattern Recognit..

[20]  Yannis Kalantidis,et al.  Getting the look: clothing recognition and segmentation for automatic product suggestions in everyday photos , 2013, ICMR.

[21]  Abhishek Das,et al.  Grad-CAM: Visual Explanations from Deep Networks via Gradient-Based Localization , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[22]  Xiaogang Wang,et al.  DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Geoffrey E. Hinton,et al.  Visualizing non-metric similarities in multiple maps , 2011, Machine Learning.

[24]  Francesc Moreno-Noguer,et al.  Neuroaesthetics in fashion: Modeling the perception of fashionability , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Ioannis A. Kakadiaris,et al.  Hierarchical Multi-label Classification using Fully Associative Ensemble Learning , 2017, Pattern Recognit..

[26]  Cordelia Schmid,et al.  Good Practice in Large-Scale Learning for Image Classification , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Hong Lu,et al.  Deep Fashion Analysis with Feature Map Upsampling and Landmark-Driven Attention , 2018, ECCV Workshops.

[28]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29]  Jianping Fan,et al.  Cost-sensitive learning of hierarchical tree classifiers for large-scale image classification and novel category detection , 2015, Pattern Recognit..

[30]  David A. Forsyth,et al.  Large multi-class image categorization with ensembles of label trees , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[31]  Jian Yang,et al.  Selective Kernel Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Bo Zhao,et al.  Diversified Visual Attention Networks for Fine-Grained Object Classification , 2016, IEEE Transactions on Multimedia.

[33]  Svetlana Lazebnik,et al.  Where to Buy It: Matching Street Clothing Photos in Online Shops , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Yang Wang,et al.  Learning mid-level features from object hierarchy for image classification , 2014, IEEE Winter Conference on Applications of Computer Vision.

[35]  Joshua B. Tenenbaum,et al.  Learning with Hierarchical-Deep Models , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[37]  Xuelong Li,et al.  Detection of Co-salient Objects by Looking Deep and Wide , 2016, International Journal of Computer Vision.

[38]  Rich Caruana,et al.  Do Deep Nets Really Need to be Deep? , 2013, NIPS.

[39]  Qiang Chen,et al.  Cross-Domain Image Retrieval with a Dual Attribute-Aware Ranking Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[40]  Jun Wang,et al.  Exploring Inter-feature and Inter-class Relationships with Deep Neural Networks for Video Classification , 2014, ACM Multimedia.

[41]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[42]  Jian Dong,et al.  Deep domain adaptation for describing people based on fine-grained clothing attributes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Fei-Fei Li,et al.  Hierarchical semantic indexing for large scale image retrieval , 2011, CVPR 2011.

[44]  Wei Wang,et al.  Multi-task deep neural network for multi-label learning , 2013, 2013 IEEE International Conference on Image Processing.

[45]  King Ngi Ngan,et al.  Unsupervised extraction of visual attention objects in color images , 2006, IEEE Transactions on Circuits and Systems for Video Technology.

[46]  Baihua Xiao,et al.  Multi-Kernel Coupled Projections for Domain Adaptive Dictionary Learning , 2019, IEEE Transactions on Multimedia.

[47]  Huchuan Lu,et al.  Deep Mutual Learning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Xu Lan,et al.  Knowledge Distillation by On-the-Fly Native Ensemble , 2018, NeurIPS.

[49]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Qi Wu,et al.  Multilabel Image Classification With Regional Latent Semantic Dependencies , 2016, IEEE Transactions on Multimedia.

[51]  Xiangyu Zhang,et al.  ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.