Clothing Matching Based on Multi-modal Data

Clothing, as a kind of beauty-enhancing product, plays an important role in people’s daily life. People want to look good by dressing properly. Nevertheless, not everyone is good at clothing matching and thus is able to make aesthetic outfits. Fortunately, certain fashion-oriented online community (e.g., Polyvore) allows fashion experts to share their outfit compositions to the public. Each outfit composition there usually consists of several complementary items (e.g., tops, bottoms and shoes), where both the visual image and textual title are available for each item. In this work, we aim to take fully advantage of such rich fashion data to decode the secret of clothing matching. Essentially, we propose a method (CMVT) to comprehensively measure the compatibility among fashion items by integrating the multi-modal data of items. Extensive experiments have been conducted on a real-world dataset to evaluate the effectiveness of the proposed model.

[1]  Tat-Seng Chua,et al.  Micro Tells Macro: Predicting the Popularity of Micro-Videos via a Transductive Model , 2016, ACM Multimedia.

[2]  Tat-Seng Chua,et al.  Online Collaborative Learning for Open-Vocabulary Visual Classifiers , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Yi-Liang Zhao,et al.  Volunteerism Tendency Prediction via Harvesting Multiple Social Networks , 2016, ACM Trans. Inf. Syst..

[4]  Robinson Piramuthu,et al.  Large scale visual recommendations from street fashion images , 2014, KDD.

[5]  Meng Wang,et al.  Learning Visual Semantic Relationships for Efficient Visual Retrieval , 2015, IEEE Transactions on Big Data.

[6]  Meng Wang,et al.  Oracle in Image Search: A Content-Based Approach to Performance Prediction , 2012, TOIS.

[7]  Tomoharu Iwata,et al.  Fashion Coordinates Recommender System Using Photographs from Fashion Magazines , 2011, IJCAI.

[8]  Tat-Seng Chua,et al.  Fast Matrix Factorization for Online Recommendation with Implicit Feedback , 2016, SIGIR.

[9]  Roger Zimmermann,et al.  Flickr Circles: Aesthetic Tendency Discovery by Multi-View Regularized Topic Modeling , 2016, IEEE Transactions on Multimedia.

[10]  Ellen M. Voorhees,et al.  The TREC-8 Question Answering Track Report , 1999, TREC.

[11]  Luming Zhang,et al.  Multiple Social Network Learning and Its Application in Volunteerism Tendency Prediction , 2015, SIGIR.

[12]  Mohan S. Kankanhalli,et al.  Temporal encoded F-formation system for social interaction detection , 2013, ACM Multimedia.

[13]  Xuelong Li,et al.  Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search , 2013, IEEE Transactions on Image Processing.

[14]  L. Bottou Stochastic Gradient Learning in Neural Networks , 1991 .

[15]  Tat-Seng Chua,et al.  Item Silk Road: Recommending Items from Information Domains to Social Users , 2017, SIGIR.

[16]  Meng Wang,et al.  Multi-View Object Retrieval via Multi-Scale Topic Models , 2016, IEEE Transactions on Image Processing.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[19]  Meng Wang,et al.  Multimedia answering: enriching text QA with media information , 2011, SIGIR.

[20]  Jun Ma,et al.  NeuroStylist: Neural Compatibility Modeling for Clothing Matching , 2017, ACM Multimedia.

[21]  Tat-Seng Chua,et al.  Neural Collaborative Filtering , 2017, WWW.

[22]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[23]  Meng Wang,et al.  Harvesting visual concepts for image search with complex queries , 2012, ACM Multimedia.

[24]  Xing Xie,et al.  Mining city landmarks from blogs by graph modeling , 2009, ACM Multimedia.