An End-to-End Attention-Based Neural Model for Complementary Clothing Matching

In modern society, people tend to prefer fashionable and decent outfits that can meet more than basic physiological needs. In fact, a proper outfit usually relies on good matching among complementary fashion items (e.g., the top, bottom, and shoes) that compose it, which thus propels us to investigate the automatic complementary clothing matching scheme. However, this is non-trivial due to the following challenges. First, the main challenge lies in how to accurately model the compatibility between complementary fashion items (e.g., the top and bottom) that come from the heterogeneous spaces with multi-modalities (e.g., the visual modality and textual modality). Second, since different features (e.g., the color, style, and pattern) of fashion items may contribute differently to compatibility modeling, how to encode the confidence of different pairwise features presents a tough challenge. Third, how to jointly learn the latent representation of multi-modal data and the compatibility between complementary fashion items contributes to the last challenge. Toward this end, in this work, we present an end-to-end attention-based neural framework for the compatibility modeling, where we introduce a feature-level attention model to adaptively learn the confidence for different pairwise features. Extensive experiments on a public available real-world dataset show the superiority of our model over state-of-the-art methods.

[1]  Wenjun Zeng,et al.  An End-to-End Spatio-Temporal Attention Model for Human Action Recognition from Skeleton Data , 2016, AAAI.

[2]  Rita Cucchiara,et al.  Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention , 2017 .

[3]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[4]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[5]  Yoon Kim,et al.  Convolutional Neural Networks for Sentence Classification , 2014, EMNLP.

[6]  Xiaoqiang Lu,et al.  Key Frame Extraction in the Summary Space , 2018, IEEE Transactions on Cybernetics.

[7]  Lejian Liao,et al.  Inferring a Personalized Next Point-of-Interest Recommendation Model with Latent Behavior Patterns , 2016, AAAI.

[8]  Ryosuke Goto,et al.  Outfit Generation and Style Extraction via Bidirectional LSTM and Autoencoder , 2018, ArXiv.

[9]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[10]  Jing Huang,et al.  Interpretable Convolutional Neural Networks with Dual Local and Global Attention for Review Rating Prediction , 2017, RecSys.

[11]  Yun Fu,et al.  Deep Bidirectional Cross-Triplet Embedding for Online Clothing Shopping , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[12]  Rongrong Ji,et al.  ESPACE: Accelerating Convolutional Neural Networks via Eliminating Spatial and Channel Redundancy , 2017, AAAI.

[13]  Hsuan-Tien Lin,et al.  Compatibility Family Learning for Item Recommendation and Generation , 2017, AAAI.

[14]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[15]  Xiongkuo Min,et al.  Fixation prediction through multimodal analysis , 2015, 2015 Visual Communications and Image Processing (VCIP).

[16]  Jun Ma,et al.  NeuroStylist: Neural Compatibility Modeling for Clothing Matching , 2017, ACM Multimedia.

[17]  Wei Liu,et al.  DeepProduct: Mobile Product Search With Portable Deep Features , 2018, ACM Trans. Multim. Comput. Commun. Appl..

[18]  Tomoharu Iwata,et al.  Fashion Coordinates Recommender System Using Photographs from Fashion Magazines , 2011, IJCAI.

[19]  Alex Graves,et al.  Recurrent Models of Visual Attention , 2014, NIPS.

[20]  Yoshua Bengio,et al.  Neural Machine Translation by Jointly Learning to Align and Translate , 2014, ICLR.

[21]  Yu-Gang Jiang,et al.  Learning Fashion Compatibility with Bidirectional LSTMs , 2017, ACM Multimedia.

[22]  Yejun Liu,et al.  Towards Better Understanding the Clothing Fashion Styles: A Multimodal Deep Learning Approach , 2017, AAAI.

[23]  Chunyan Miao,et al.  A Boosting Algorithm for Item Recommendation with Implicit Feedback , 2015, IJCAI.

[24]  Tat-Seng Chua,et al.  SCA-CNN: Spatial and Channel-Wise Attention in Convolutional Networks for Image Captioning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Yin Li,et al.  Learning Deep Structure-Preserving Image-Text Embeddings , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Jiebo Luo,et al.  Mining Fashion Outfit Composition Using an End-to-End Deep Learning Approach on Set Data , 2016, IEEE Transactions on Multimedia.

[27]  Larry S. Davis,et al.  Collaborative Fashion Recommendation: A Functional Tensor Factorization Approach , 2015, ACM Multimedia.

[28]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[29]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  M. de Rijke,et al.  Explainable Fashion Recommendation with Joint Outfit Matching and Comment Generation , 2020 .

[31]  Lei Yu,et al.  A Hybrid Collaborative Filtering Model with Deep Structure for Recommender Systems , 2017, AAAI.

[32]  Gang Wang,et al.  Graininess-Aware Deep Feature Learning for Pedestrian Detection , 2018, ECCV.

[33]  Xuelong Li,et al.  A General Framework for Edited Video and Raw Video Summarization , 2017, IEEE Transactions on Image Processing.

[34]  Bowen Zhou,et al.  A Structured Self-attentive Sentence Embedding , 2017, ICLR.

[35]  Xiaogang Wang,et al.  DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Changsheng Xu,et al.  Hi, magic closet, tell me what to wear! , 2012, ACM Multimedia.

[38]  Yuxin Peng,et al.  Weakly Supervised Learning of Part Selection Model with Spatial Constraints for Fine-Grained Image Classification , 2017, AAAI.

[39]  Julian J. McAuley,et al.  VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback , 2015, AAAI.

[40]  Robinson Piramuthu,et al.  Large scale visual recommendations from street fashion images , 2014, KDD.

[41]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[42]  Xueqi Cheng,et al.  Text Matching as Image Recognition , 2016, AAAI.

[43]  Xuelong Li,et al.  Video parsing via spatiotemporally analysis with images , 2015, Multimedia Tools and Applications.

[44]  Zhaochun Ren,et al.  Neural Attentive Session-based Recommendation , 2017, CIKM.

[45]  Chenxi Liu,et al.  Attention Correctness in Neural Image Captioning , 2016, AAAI.

[46]  Zhaochun Ren,et al.  Explainable Outfit Recommendation with Joint Outfit Matching and Comment Generation , 2018, IEEE Transactions on Knowledge and Data Engineering.

[47]  Xiangnan He,et al.  Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention , 2017, SIGIR.

[48]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[49]  Jiwen Lu,et al.  Structure-Aware Multimodal Feature Fusion for RGB-D Scene Classification and Beyond , 2018, ACM Trans. Multim. Comput. Commun. Appl..