论文信息 - Fine-Grained Fashion Similarity Learning by Attribute-Specific Embedding Network

Fine-Grained Fashion Similarity Learning by Attribute-Specific Embedding Network

This paper strives to learn fine-grained fashion similarity. In this similarity paradigm, one should pay more attention to the similarity in terms of a specific design/attribute among fashion items, which has potential values in many fashion related applications such as fashion copyright protection. To this end, we propose an Attribute-Specific Embedding Network (ASEN) to jointly learn multiple attribute-specific embeddings in an end-to-end manner, thus measure the fine-grained similarity in the corresponding space. With two attention modules, i.e., Attribute-aware Spatial Attention and Attribute-aware Channel Attention, ASEN is able to locate the related regions and capture the essential patterns under the guidance of the specified attribute, thus make the learned attribute-specific embeddings better reflect the fine-grained similarity. Extensive experiments on four fashion-related datasets show the effectiveness of ASEN for fine-grained fashion similarity learning and its potential for fashion reranking. Code and data are available at https://github.com/Maryeon/asen.

[1] Kristen Grauman,et al. Fine-Grained Visual Comparisons with Local Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2] Bo Zhao,et al. Memory-Augmented Attribute Manipulation Networks for Interactive Fashion Search , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Jun Zhou,et al. Clothing retrieval with visual attention model , 2017, 2017 IEEE Visual Communications and Image Processing (VCIP).

[4] David A. Forsyth,et al. Learning Type-Aware Embeddings for Fashion Compatibility , 2018, ECCV.

[5] Qiang Chen,et al. Cross-Domain Image Retrieval with a Dual Attribute-Aware Ranking Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6] Xiaogang Wang,et al. Residual Attention Network for Image Classification , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7] Jianfeng Dong,et al. Exploring Human-like Attention Supervision in Visual Question Answering , 2017, AAAI.

[8] Li Fei-Fei,et al. ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[9] Yi Yang,et al. A Discriminatively Learned CNN Embedding for Person Reidentification , 2016, ACM Trans. Multim. Comput. Commun. Appl..

[10] Jonathan G. Fiscus,et al. TRECVID 2018: Benchmarking Video Activity Detection, Video Captioning and Matching, Video Storytelling Linking and Video Search , 2018, TRECVID.

[11] Yang Cao,et al. FashionAI: A Hierarchical Dataset for Fashion Understanding , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[12] Julian J. McAuley,et al. Learning Compatibility Across Categories for Heterogeneous Item Recommendation , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[13] Jo Yew Tham,et al. Learning Attribute Representations with Localization for Flexible Fashion Search , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[14] H. A. Ananya,et al. Deep Learning based Large Scale Visual Recommendation and Search for E-Commerce , 2017, ArXiv.

[15] Jo Yew Tham,et al. Efficient Multi-attribute Similarity Learning Towards Attribute-Based Fashion Search , 2018, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[16] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[17] Xiaogang Wang,et al. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18] Serge J. Belongie,et al. Learning Visual Clothing Style with Heterogeneous Dyadic Co-Occurrences , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[19] Yoshua Bengio,et al. Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[20] Xirong Li,et al. Dual Encoding for Zero-Example Video Retrieval , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[21] In-So Kweon,et al. CBAM: Convolutional Block Attention Module , 2018, ECCV.

[22] Larry S. Davis,et al. Automatic Spatially-Aware Fashion Concept Discovery , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[23] Meihui Zhang,et al. Cross-Domain Image Retrieval with Attention Modeling , 2017, ACM Multimedia.

[24] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26] Ramón Baldrich,et al. Cross-Domain Fashion Image Retrieval , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[27] Tina Martin,et al. Fashion Law Needs Custom Tailored Protection for Designs , 2019 .

[28] Xirong Li,et al. Cross-Media Similarity Evaluation for Web Image Retrieval in the Wild , 2017, IEEE Transactions on Multimedia.

[29] Yu-Gang Jiang,et al. Learning Fashion Compatibility with Bidirectional LSTMs , 2017, ACM Multimedia.

[30] Serge J. Belongie,et al. Conditional Similarity Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31] Enhua Wu,et al. Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32] Lukasz Kaiser,et al. Attention is All you Need , 2017, NIPS.