Deep Bidirectional Cross-Triplet Embedding for Online Clothing Shopping

In this article, we address the cross-domain (i.e., street and shop) clothing retrieval problem and investigate its real-world applications for online clothing shopping. It is a challenging problem due to the large discrepancy between street and shop domain images. We focus on learning an effective feature-embedding model to generate robust and discriminative feature representation across domains. Existing triplet embedding models achieve promising results by finding an embedding metric in which the distance between negative pairs is larger than the distance between positive pairs plus a margin. However, existing methods do not address the challenges in the cross-domain clothing retrieval scenario sufficiently. First, the intradomain and cross-domain data relationships need to be considered simultaneously. Second, the number of matched and nonmatched cross-domain pairs are unbalanced. To address these challenges, we propose a deep cross-triplet embedding algorithm together with a cross-triplet sampling strategy. The extensive experimental evaluations demonstrate the effectiveness of the proposed algorithms well. Furthermore, we investigate two novel online shopping applications, clothing trying on and accessories recommendation, based on a unified cross-domain clothing retrieval framework.

[1]  Yann LeCun,et al.  Dimensionality Reduction by Learning an Invariant Mapping , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[2]  Jian Dong,et al.  Deep domain adaptation for describing people based on fine-grained clothing attributes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Changsheng Xu,et al.  Matching-CNN meets KNN: Quasi-parametric human parsing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Yun Fu,et al.  Multi-View Clustering via Deep Matrix Factorization , 2017, AAAI.

[5]  Kavita Bala,et al.  Learning visual similarity for product design with convolutional neural networks , 2015, ACM Trans. Graph..

[6]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[7]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[8]  Robinson Piramuthu,et al.  Large scale visual recommendations from street fashion images , 2014, KDD.

[9]  Yun Fu,et al.  From Ensemble Clustering to Multi-View Clustering , 2017, IJCAI.

[10]  Francesc Moreno-Noguer,et al.  Neuroaesthetics in fashion: Modeling the perception of fashionability , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[11]  Tomoharu Iwata,et al.  Fashion Coordinates Recommender System Using Photographs from Fashion Magazines , 2011, IJCAI.

[12]  Shuhui Jiang,et al.  Deep Bi-directional Cross-triplet Embedding for Cross-Domain Clothing Retrieval , 2016, ACM Multimedia.

[13]  Svetlana Lazebnik,et al.  Where to Buy It: Matching Street Clothing Photos in Online Shops , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[15]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[16]  Yannis Kalantidis,et al.  Getting the look: clothing recognition and segmentation for automatic product suggestions in everyday photos , 2013, ICMR.

[17]  Ming Shao,et al.  Missing Modality Transfer Learning via Latent Low-Rank Constraint , 2015, IEEE Transactions on Image Processing.

[18]  Min Xu,et al.  Efficient Clothing Retrieval with Semantic-Preserving Visual Phrases , 2012, ACCV.

[19]  Yang Song,et al.  Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Ming Shao,et al.  Consensus Style Centralizing Auto-Encoder for Weak Style Classification , 2016, AAAI.

[21]  Serge J. Belongie,et al.  Learning Visual Clothing Style with Heterogeneous Dyadic Co-Occurrences , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[22]  Yun Fu,et al.  Low-Rank Common Subspace for Multi-view Learning , 2014, 2014 IEEE International Conference on Data Mining.

[23]  Anton van den Hengel,et al.  Image-Based Recommendations on Styles and Substitutes , 2015, SIGIR.

[24]  Alexander C. Berg,et al.  Runway to Realway: Visual Analysis of Fashion , 2015, 2015 IEEE Winter Conference on Applications of Computer Vision.

[25]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[26]  Henry Lieberman,et al.  What am I gonna wear?: scenario-oriented recommendation , 2007, IUI '07.

[27]  Mor Naaman,et al.  Generating diverse and representative image search results for landmarks , 2008, WWW.

[28]  Tao Mei,et al.  Mobile multimedia travelogue generation by exploring geo-locations and image tags , 2013, 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013).

[29]  Changsheng Xu,et al.  Hi, magic closet, tell me what to wear! , 2012, ACM Multimedia.

[30]  Xueming Qian,et al.  Generating representative images for landmark by discovering high frequency shooting locations from community-contributed photos , 2013, 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[31]  Yu Zhou,et al.  Matching User Photos to Online Products with Robust Deep Features , 2016, ICMR.

[32]  Yun Fu,et al.  Fashion Style Generator , 2017, IJCAI.

[33]  Qiang Chen,et al.  Cross-Domain Image Retrieval with a Dual Attribute-Aware Ranking Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[34]  Alexander C. Berg,et al.  Hipster Wars: Discovering Elements of Fashion Styles , 2014, ECCV.

[35]  Changsheng Xu,et al.  Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[36]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Tao Mei,et al.  Author Topic Model-Based Collaborative Filtering for Personalized POI Recommendations , 2015, IEEE Transactions on Multimedia.

[38]  O. K. Gowrishankar,et al.  Personalized Travel Sequence Recommendation on Multi-Source Big Social Media , 2016, IEEE Transactions on Big Data.

[39]  Luis E. Ortiz,et al.  Parsing clothing in fashion photographs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Jian Yang,et al.  Sparse Deep Stacking Network for Image Classification , 2015, AAAI.

[41]  Xiaogang Wang,et al.  DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[42]  Ming Ouhyoung,et al.  Chromirror: a real-time interactive mirror for chromatic and color-harmonic dressing , 2008, CHI Extended Abstracts.

[43]  Daniel Cohen-Or,et al.  Color harmonization , 2006, ACM Trans. Graph..

[44]  Silvio Savarese,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Tamara L. Berg,et al.  Paper Doll Parsing: Retrieving Similar Styles to Parse Clothing Items , 2013, 2013 IEEE International Conference on Computer Vision.

[46]  Jian Yang,et al.  Sparseness Analysis in the Pretraining of Deep Neural Networks , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[47]  Jian Zhang,et al.  Convolutional Sparse Autoencoders for Image Classification , 2018, IEEE Transactions on Neural Networks and Learning Systems.

[48]  Robinson Piramuthu,et al.  Style Finder: Fine-Grained Clothing Style Detection and Retrieval , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.