CRAFT: Complementary Recommendation by Adversarial Feature Transform

We propose a framework that harnesses visual cues in an unsupervised manner to learn the co-occurrence distribution of items in real-world images for complementary recommendation. Our model learns a non-linear transformation between the two manifolds of source and target item categories (e.g., tops and bottoms in outfits). Given a large dataset of images containing instances of co-occurring items, we train a generative transformer network directly on the feature representation by casting it as an adversarial optimization problem. Such a conditional generative model can produce multiple novel samples of complementary items (in the feature space) for a given query item. We demonstrate our framework for the task of recommending complementary top apparel for a given bottom clothing item. The recommendations made by our system are diverse, and are favored by human experts over the baseline approaches.

[1]  Jian Dong,et al.  Deep Human Parsing with Active Template Regression , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Antonio Torralba,et al.  Generating the Future with Adversarial Transformers , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Tomas Pfister,et al.  Learning from Simulated and Unsupervised Images through Adversarial Training , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Marius Leordeanu,et al.  Unsupervised Learning from Video to Detect Foreground Objects in Single Images , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[5]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[6]  Andrew Zisserman,et al.  Multi-task Self-Supervised Visual Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[7]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[8]  Guillaume Lample,et al.  Fader Networks: Manipulating Images by Sliding Attributes , 2017, NIPS.

[9]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[10]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[11]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[12]  Julian J. McAuley,et al.  Learning Compatibility Across Categories for Heterogeneous Item Recommendation , 2016, 2016 IEEE 16th International Conference on Data Mining (ICDM).

[13]  Dumitru Erhan,et al.  Unsupervised Pixel-Level Domain Adaptation with Generative Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Svetlana Lazebnik,et al.  Where to Buy It: Matching Street Clothing Photos in Online Shops , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[15]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[16]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Yehuda Koren,et al.  Advances in Collaborative Filtering , 2011, Recommender Systems Handbook.

[18]  Silvio Savarese,et al.  Adversarial Feature Augmentation for Unsupervised Domain Adaptation , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[19]  Jaakko Lehtinen,et al.  Progressive Growing of GANs for Improved Quality, Stability, and Variation , 2017, ICLR.

[20]  Hanqing Lu,et al.  Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Andrew L. Maas Rectifier Nonlinearities Improve Neural Network Acoustic Models , 2013 .

[22]  Jan Kautz,et al.  High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Serge J. Belongie,et al.  Learning Visual Clothing Style with Heterogeneous Dyadic Co-Occurrences , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[24]  Trevor Darrell,et al.  Adversarial Feature Learning , 2016, ICLR.

[25]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[26]  Trevor Darrell,et al.  Adversarial Discriminative Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[28]  Yu-Gang Jiang,et al.  Learning Fashion Compatibility with Bidirectional LSTMs , 2017, ACM Multimedia.

[29]  Sergey Ioffe,et al.  Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[30]  Sergey Ioffe,et al.  Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[31]  Wojciech Zaremba,et al.  Improved Techniques for Training GANs , 2016, NIPS.

[32]  Sanja Fidler,et al.  Be Your Own Prada: Fashion Synthesis with Structural Coherence , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Raymond J. Mooney,et al.  Content-boosted collaborative filtering for improved recommendations , 2002, AAAI/IAAI.

[34]  Xiaogang Wang,et al.  Pyramid Scene Parsing Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Gediminas Adomavicius,et al.  Toward the next generation of recommender systems: a survey of the state-of-the-art and possible extensions , 2005, IEEE Transactions on Knowledge and Data Engineering.

[36]  Yoshua Bengio,et al.  Mode Regularized Generative Adversarial Networks , 2016, ICLR.