Learning Type-Aware Embeddings for Fashion Compatibility

Outfits in online fashion data are composed of items of many different types (e.g. top, bottom, shoes) that share some stylistic relationship with one another. A representation for building outfits requires a method that can learn both notions of similarity (for example, when two tops are interchangeable) and compatibility (items of possibly different type that can go together in an outfit). This paper presents an approach to learning an image embedding that respects item type, and jointly learns notions of item similarity and compatibility in an end-to-end model. To evaluate the learned representation, we crawled 68,306 outfits created by users on the Polyvore website. Our approach obtains 3–5% improvement over the state-of-the-art on outfit compatibility prediction and fill-in-the-blank tasks using our dataset, as well as an established smaller dataset, while supporting a variety of useful queries (Code and data: https://github.com/mvasil/fashion-compatibility).

[1]  Jiebo Luo,et al.  Mining Fashion Outfit Composition Using an End-to-End Deep Learning Approach on Set Data , 2016, IEEE Transactions on Multimedia.

[2]  Sanja Fidler,et al.  Be Your Own Prada: Fashion Synthesis with Structural Coherence , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[3]  Rainer Stiefelhagen,et al.  Fashion Forward: Forecasting Visual Style in Fashion , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[4]  Serge J. Belongie,et al.  Conditional Similarity Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Kristen Grauman,et al.  Learning the Latent “Look”: Unsupervised Discovery of a Style-Coherent Embedding from Fashion Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[6]  Yu-Gang Jiang,et al.  Learning Fashion Compatibility with Bidirectional LSTMs , 2017, ACM Multimedia.

[7]  C. V. Jawahar,et al.  Self-Supervised Learning of Visual Features through Embedding Images into Text Topic Spaces , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Xiaogang Wang,et al.  DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  George Vogiatzis,et al.  Dress Like a Star: Retrieving Fashion Products from Videos , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[10]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Lior Wolf,et al.  Fisher Vectors Derived from Hybrid Gaussian-Laplacian Mixture Models for Image Annotation , 2014, ArXiv.

[12]  Minlie Huang,et al.  SSP: Semantic Space Projection for Knowledge Graph Embedding with Text Descriptions , 2016, AAAI.

[13]  Francesc Moreno-Noguer,et al.  Multi-modal Embedding for Main Product Detection in Fashion , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[14]  Robinson Piramuthu,et al.  Style Finder: Fine-Grained Clothing Style Detection and Retrieval , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[15]  Alexander J. Smola,et al.  Sampling Matters in Deep Embedding Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[16]  Julian J. McAuley,et al.  Learning Compatibility Across Categories for Heterogeneous Item Recommendation , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[17]  Changsheng Xu,et al.  Hi, magic closet, tell me what to wear! , 2012, ACM Multimedia.

[18]  Amaia Salvador,et al.  Learning Cross-Modal Embeddings for Cooking Recipes and Food Images , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Xiao Zhang,et al.  Learning Unified Embedding for Apparel Recognition , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[20]  Ranjitha Kumar,et al.  The Elements of Fashion Style , 2016, UIST.

[21]  Alexander C. Berg,et al.  Hipster Wars: Discovering Elements of Fashion Styles , 2014, ECCV.

[22]  Ian D. Reid,et al.  Fast Training of Triplet-Based Deep Binary Embedding Networks , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jian Dong,et al.  Deep domain adaptation for describing people based on fine-grained clothing attributes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Hedi Ben-younes,et al.  Leveraging Weakly Annotated Data for Fashion Image Retrieval and Label Prediction , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[25]  Svetlana Lazebnik,et al.  Where to Buy It: Matching Street Clothing Photos in Online Shops , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[26]  Yong Jae Lee,et al.  End-to-End Localization and Ranking for Relative Attributes , 2016, ECCV.

[27]  Kavita Bala,et al.  Learning visual similarity for product design with convolutional neural networks , 2015, ACM Trans. Graph..

[28]  Serge J. Belongie,et al.  Learning Visual Clothing Style with Heterogeneous Dyadic Co-Occurrences , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[29]  Thorsten Joachims,et al.  Learning a Distance Metric from Relative Comparisons , 2003, NIPS.

[30]  Francesc Moreno-Noguer,et al.  Neuroaesthetics in fashion: Modeling the perception of fashionability , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Kristen Grauman,et al.  Semantic Jitter: Dense Supervision for Visual Comparisons via Synthetic Images , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[32]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Huizhong Chen,et al.  Describing Clothing by Semantic Attributes , 2012, ECCV.

[34]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Kristen Grauman,et al.  Just Noticeable Differences in Visual Attributes , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[36]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[37]  Takayuki Okatani,et al.  Mix and Match: Joint Model for Clothing and Attribute Recognition , 2015, BMVC.

[38]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[39]  Francesc Moreno-Noguer,et al.  Multi-modal joint embedding for fashion product retrieval , 2017, 2017 IEEE International Conference on Image Processing (ICIP).

[40]  Hiroshi Ishikawa,et al.  Fashion Style in 128 Floats: Joint Ranking and Classification Using Weak Data for Feature Extraction , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[42]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[43]  Kristen Grauman,et al.  Fine-Grained Visual Comparisons with Local Learning , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[44]  Bo Zhao,et al.  Memory-Augmented Attribute Manipulation Networks for Interactive Fashion Search , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).