Fine-Grained Fashion Similarity Prediction by Attribute-Specific Embedding Learning

This paper strives to predict fine-grained fashion similarity. In this similarity paradigm, one should pay more attention to the similarity in terms of a specific design/attribute between fashion items. For example, whether the collar designs of the two clothes are similar. It has potential value in many fashion related applications, such as fashion copyright protection. To this end, we propose an Attribute-Specific Embedding Network (ASEN) to jointly learn multiple attribute-specific embeddings, thus measure the fine-grained similarity in the corresponding space. The proposed ASEN is comprised of a global branch and a local branch. The global branch takes the whole image as input to extract features from a global perspective, while the local branch takes as input the zoomed-in region-of-interest (RoI) w.r.t. the specified attribute thus able to extract more fine-grained features. As the global branch and the local branch extract the features from different perspectives, they are complementary to each other. Additionally, in each branch, two attention modules, i.e., Attribute-aware Spatial Attention and Attribute-aware Channel Attention, are integrated to make ASEN be able to locate the related regions and capture the essential patterns under the guidance of the specified attribute, thus make the learned attribute-specific embeddings better reflect the fine-grained similarity. Extensive experiments on three fashion-related datasets, i.e., FashionAI, DARN, and DeepFashion, show the effectiveness of ASEN for fine-grained fashion similarity prediction and its potential for fashion reranking. Code and data are available at

[1]  Xirong Li,et al.  Dual Encoding for Video Retrieval by Text , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Jianfeng Dong,et al.  Context-aware Biaffine Localizing Network for Temporal Sentence Grounding , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Xirong Li,et al.  Feature Re-Learning with Data Augmentation for Video Relevance Prediction , 2020, IEEE Transactions on Knowledge and Data Engineering.

[4]  Tinne Tuytelaars,et al.  On the Exploration of Incremental Learning for Fine-grained Image Retrieval , 2020, BMVC.

[5]  Yu Cheng,et al.  Fine-grained Iterative Attention Network for Temporal Language Localization in Videos , 2020, ACM Multimedia.

[6]  Pan Zhou,et al.  Jointly Cross- and Self-Modal Graph Attention Network for Query-Based Moment Localization , 2020, ACM Multimedia.

[7]  Xiu-Shen Wei,et al.  ExchNet: A Unified Hashing Network for Large-Scale Fine-Grained Image Retrieval , 2020, ECCV.

[8]  Tat-Seng Chua,et al.  Tree-Augmented Cross-Modal Encoding for Complex-Query Video Retrieval , 2020, SIGIR.

[9]  Fan Yang,et al.  Which Is Plagiarism: Fashion Image Retrieval Based on Regional Representation for Design Protection , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Shaogang Gong,et al.  Image Search With Text Feedback by Visiolinguistic Attention Learning , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Wai Keung Wong,et al.  Knowledge Enhanced Neural Fashion Trend Forecasting , 2020, ICMR.

[12]  Xiaoyu Du,et al.  Learning to Match on Graph for Fashion Compatibility Modeling , 2020, AAAI.

[13]  Jianfeng Dong,et al.  Fine-Grained Fashion Similarity Learning by Attribute-Specific Embedding Network , 2020, AAAI.

[14]  Yuxin Peng,et al.  Fine-Grained Visual-Textual Representation Learning , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[15]  Larry S. Davis,et al.  Fashion Outfit Complementary Item Retrieval , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Yuxin Peng,et al.  MAVA: Multi-Level Adaptive Visual-Textual Alignment by Cross-Media Bi-Attention Mechanism , 2019, IEEE Transactions on Image Processing.

[17]  Le Wu,et al.  A Hierarchical Attention Model for Social Contextual Image Recommendation , 2018, IEEE Transactions on Knowledge and Data Engineering.

[18]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Xirong Li,et al.  W2VV++: Fully Deep Learning for Ad-hoc Video Search , 2019, ACM Multimedia.

[20]  Roger Zimmermann,et al.  Multi-Level Fusion based Class-aware Attention Model for Weakly Labeled Audio Tagging , 2019, ACM Multimedia.

[21]  Meng Wang,et al.  Person Reidentification via Structural Deep Metric Learning , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[22]  Guanbin Li,et al.  Fashion Retrieval via Graph Reasoning Networks on a Similarity Pyramid , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[23]  Furkan Kiraç,et al.  Fashion Image Retrieval with Capsule Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision Workshop (ICCVW).

[24]  Kate Saenko,et al.  Learning Similarity Conditions Without Explicit Supervision , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[25]  Tat-Seng Chua,et al.  Who, Where, and What to Wear?: Extracting Fashion Knowledge from Social Media , 2019, ACM Multimedia.

[26]  Tat-Seng Chua,et al.  Interpretable Fashion Matching with Rich Attributes , 2019, SIGIR.

[27]  Rongrong Ji,et al.  Towards Optimal Fine Grained Retrieval via Decorrelated Centralized Loss with Normalize-Scale Layer , 2019, AAAI.

[28]  Xiu-Shen Wei,et al.  Deep Learning for Fine-Grained Image Analysis: A Survey , 2019, ArXiv.

[29]  Yang Cao,et al.  FashionAI: A Hierarchical Dataset for Fashion Understanding , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[30]  Quoc V. Le,et al.  Attention Augmented Convolutional Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[31]  Matthew R. Scott,et al.  Multi-Similarity Loss With General Pair Weighting for Deep Metric Learning , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Meng Wang,et al.  TransNFCM: Translation-Based Neural Fashion Compatibility Modeling , 2018, AAAI.

[33]  Xirong Li,et al.  Dual Encoding for Zero-Example Video Retrieval , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Tina Martin,et al.  Fashion Law Needs Custom Tailored Protection for Designs , 2019 .

[35]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[36]  Jonathan G. Fiscus,et al.  TRECVID 2018: Benchmarking Video Activity Detection, Video Captioning and Matching, Video Storytelling Linking and Video Search , 2018, TRECVID.

[37]  Tat-Seng Chua,et al.  Knowledge-aware Multimodal Fashion Chatbot , 2018, ACM Multimedia.

[38]  Xiangnan He,et al.  NAIS: Neural Attentive Item Similarity Model for Recommendation , 2018, IEEE Transactions on Knowledge and Data Engineering.

[39]  In-So Kweon,et al.  CBAM: Convolutional Block Attention Module , 2018, ECCV.

[40]  Rongrong Ji,et al.  Centralized Ranking Loss with Weakly Supervised Localization for Fine-Grained Object Retrieval , 2018, IJCAI.

[41]  Jo Yew Tham,et al.  Learning Attribute Representations with Localization for Flexible Fashion Search , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[42]  Ramón Baldrich,et al.  Cross-Domain Fashion Image Retrieval , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[43]  Rogério Schmidt Feris,et al.  Dialog-based Interactive Image Retrieval , 2018, NeurIPS.

[44]  David A. Forsyth,et al.  Learning Type-Aware Embeddings for Fashion Compatibility , 2018, ECCV.

[45]  Jo Yew Tham,et al.  Efficient Multi-attribute Similarity Learning Towards Attribute-Based Fashion Search , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[46]  Meng Wang,et al.  Person Re-Identification With Metric Learning Using Privileged Information , 2018, IEEE Transactions on Image Processing.

[47]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[48]  Takayuki Okatani,et al.  Recommending Outfits from Personal Closet , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[49]  Xirong Li,et al.  Cross-Media Similarity Evaluation for Web Image Retrieval in the Wild , 2017, IEEE Transactions on Multimedia.

[50]  Jianfeng Dong,et al.  Exploring Human-like Attention Supervision in Visual Question Answering , 2017, AAAI.

[51]  Lei Zhang,et al.  Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[52]  Yuxin Peng,et al.  Object-Part Attention Model for Fine-Grained Image Classification , 2017, IEEE Transactions on Image Processing.

[53]  Jun Zhou,et al.  Clothing retrieval with visual attention model , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[54]  Meihui Zhang,et al.  Cross-Domain Image Retrieval with Attention Modeling , 2017, ACM Multimedia.

[55]  Meng Wang,et al.  Coherent Semantic-Visual Indexing for Large-Scale Image Retrieval in the Cloud , 2017, IEEE Transactions on Image Processing.

[56]  Larry S. Davis,et al.  Automatic Spatially-Aware Fashion Concept Discovery , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[57]  Tao Mei,et al.  Look Closer to See Better: Recurrent Attention Convolutional Neural Network for Fine-Grained Image Recognition , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Yu-Gang Jiang,et al.  Learning Fashion Compatibility with Bidirectional LSTMs , 2017, ACM Multimedia.

[59]  Bo Zhao,et al.  Memory-Augmented Attribute Manipulation Networks for Interactive Fashion Search , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[60]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[61]  Xiaogang Wang,et al.  Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[62]  Yuxin Peng,et al.  Fine-Grained Image Classification via Combining Vision and Language , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[63]  H. A. Ananya,et al.  Deep Learning based Large Scale Visual Recommendation and Search for E-Commerce , 2017, ArXiv.

[64]  Jung-Woo Ha,et al.  Dual Attention Networks for Multimodal Reasoning and Matching , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[65]  Jiebo Luo,et al.  Mining Fashion Outfit Composition Using an End-to-End Deep Learning Approach on Set Data , 2016, IEEE Transactions on Multimedia.

[66]  Xiu-Shen Wei,et al.  Selective Convolutional Descriptor Aggregation for Fine-Grained Image Retrieval , 2016, IEEE Transactions on Image Processing.

[67]  Serge J. Belongie,et al.  Conditional Similarity Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[68]  Xiaogang Wang,et al.  DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Julian J. McAuley,et al.  Learning Compatibility Across Categories for Heterogeneous Item Recommendation , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[70]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Kate Saenko,et al.  Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering , 2015, ECCV.

[73]  Svetlana Lazebnik,et al.  Where to Buy It: Matching Street Clothing Photos in Online Shops , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[74]  Serge J. Belongie,et al.  Learning Visual Clothing Style with Heterogeneous Dyadic Co-Occurrences , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[75]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[76]  Jian Dong,et al.  Deep domain adaptation for describing people based on fine-grained clothing attributes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[77]  Qiang Chen,et al.  Cross-Domain Image Retrieval with a Dual Attribute-Aware Ranking Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[78]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[79]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[80]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[81]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[82]  Changsheng Xu,et al.  Hi, magic closet, tell me what to wear! , 2012, ACM Multimedia.

[83]  Changsheng Xu,et al.  Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[84]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[85]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .