NeuroStylist: Neural Compatibility Modeling for Clothing Matching

Nowadays, as a beauty-enhancing product, clothing plays an important role in human's social life. In fact, the key to a proper outfit usually lies in the harmonious clothing matching. Nevertheless, not everyone is good at clothing matching. Fortunately, with the proliferation of fashion-oriented online communities, fashion experts can publicly share their fashion tips by showcasing their outfit compositions, where each fashion item (e.g., a top or bottom) usually has an image and context metadata (e.g., title and category). Such rich fashion data offer us a new opportunity to investigate the code in clothing matching. However, challenges co-exist with opportunities. The first challenge lies in the complicated factors, such as color, material and shape, that affect the compatibility of fashion items. Second, as each fashion item involves multiple modalities (i.e., image and text), how to cope with the heterogeneous multi-modal data also poses a great challenge. Third, our pilot study shows that the composition relation between fashion items is rather sparse, which makes traditional matrix factorization methods not applicable. Towards this end, in this work, we propose a content-based neural scheme to model the compatibility between fashion items based on the Bayesian personalized ranking (BPR) framework. The scheme is able to jointly model the coherent relation between modalities of items and their implicit matching preference. Experiments verify the effectiveness of our scheme, and we deliver deep insights that can benefit future research.

[1]  Tat-Seng Chua,et al.  Micro Tells Macro: Predicting the Popularity of Micro-Videos via a Transductive Model , 2016, ACM Multimedia.

[2]  Meng Wang,et al.  Multimedia answering: enriching text QA with media information , 2011, SIGIR.

[3]  Robinson Piramuthu,et al.  Large scale visual recommendations from street fashion images , 2014, KDD.

[4]  Robinson Piramuthu,et al.  Style Finder: Fine-Grained Clothing Style Detection and Retrieval , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[5]  Balaraman Ravindran,et al.  Bridge Correlational Neural Networks for Multilingual Multimodal Representation Learning , 2015, NAACL.

[6]  Luming Zhang,et al.  Interest Inference via Structure-Constrained Multi-Source Multi-Task Learning , 2015, IJCAI.

[7]  Luis E. Ortiz,et al.  Retrieving Similar Styles to Parse Clothing , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[8]  Tat-Seng Chua,et al.  Item Silk Road: Recommending Items from Information Domains to Social Users , 2017, SIGIR.

[9]  Wenwu Zhu,et al.  Deep Multimodal Hashing with Orthogonal Regularization , 2015, IJCAI.

[10]  Yue Gao,et al.  Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval , 2013, ACM Multimedia.

[11]  Shih-Fu Chang,et al.  Visual Translation Embedding Network for Visual Relation Detection , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Ye Wang,et al.  Improving Content-based and Hybrid Music Recommendation using Deep Learning , 2014, ACM Multimedia.

[13]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[14]  Ruifan Li,et al.  Cross-modal Retrieval with Correspondence Autoencoder , 2014, ACM Multimedia.

[15]  Changsheng Xu,et al.  Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[17]  Lars Schmidt-Thieme,et al.  BPR: Bayesian Personalized Ranking from Implicit Feedback , 2009, UAI.

[18]  Raffay Hamid,et al.  What makes an image popular? , 2014, WWW.

[19]  Yi Yang,et al.  Fast and Accurate Content-based Semantic Search in 100M Internet Videos , 2015, ACM Multimedia.

[20]  Francesc Moreno-Noguer,et al.  Neuroaesthetics in fashion: Modeling the perception of fashionability , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21]  Xing Xie,et al.  Mining city landmarks from blogs by graph modeling , 2009, ACM Multimedia.

[22]  Tomoharu Iwata,et al.  Fashion Coordinates Recommender System Using Photographs from Fashion Magazines , 2011, IJCAI.

[23]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[24]  Meng Wang,et al.  Harvesting visual concepts for image search with complex queries , 2012, ACM Multimedia.

[25]  L. Bottou Stochastic Gradient Learning in Neural Networks , 1991 .

[26]  Xuelong Li,et al.  Visual-Textual Joint Relevance Learning for Tag-Based Social Image Search , 2013, IEEE Transactions on Image Processing.

[27]  N. Latha,et al.  Personalized Recommendation Combining User Interest and Social Circle , 2015 .

[28]  Larry S. Davis,et al.  Collaborative Fashion Recommendation: A Functional Tensor Factorization Approach , 2015, ACM Multimedia.

[29]  Yee Whye Teh,et al.  A Fast Learning Algorithm for Deep Belief Nets , 2006, Neural Computation.

[30]  Svetlana Lazebnik,et al.  Where to Buy It: Matching Street Clothing Photos in Online Shops , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Rob Hall,et al.  Style in the long tail: discovering unique interests with latent variable models in large scale social E-commerce , 2014, KDD.

[32]  David A. McAllester,et al.  A discriminatively trained, multiscale, deformable part model , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Meng Wang,et al.  Oracle in Image Search: A Content-Based Approach to Performance Prediction , 2012, TOIS.

[34]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[35]  Aviv Eisenschtat,et al.  Linking Image and Text with 2-Way Nets , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Jiebo Luo,et al.  Mining Fashion Outfit Composition Using an End-to-End Deep Learning Approach on Set Data , 2016, IEEE Transactions on Multimedia.

[37]  Tat-Seng Chua,et al.  Learning from Collective Intelligence , 2016, ACM Trans. Multim. Comput. Commun. Appl..

[38]  Preslav Nakov,et al.  A non-IID Framework for Collaborative Filtering with Restricted Boltzmann Machines , 2013, ICML.

[39]  Tat-Seng Chua,et al.  Neural Collaborative Filtering , 2017, WWW.

[40]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[41]  Luis E. Ortiz,et al.  Parsing clothing in fashion photographs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  Tat-Seng Chua,et al.  Online Collaborative Learning for Open-Vocabulary Visual Classifiers , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Lars Schmidt-Thieme,et al.  Pairwise interaction tensor factorization for personalized tag recommendation , 2010, WSDM '10.

[44]  Wenwu Zhu,et al.  Structural Deep Network Embedding , 2016, KDD.

[45]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[46]  Luming Zhang,et al.  Multiple Social Network Learning and Its Application in Volunteerism Tendency Prediction , 2015, SIGIR.

[47]  Changsheng Xu,et al.  Hi, magic closet, tell me what to wear! , 2012, ACM Multimedia.

[48]  Julian J. McAuley,et al.  VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback , 2015, AAAI.