Towards Predicting the Likeability of Fashion Images

In this paper, we propose a method for ranking fashion images to find the ones which might be liked by more people. We collect two new datasets from image sharing websites (Pinterest and Polyvore). We represent fashion images based on attributes: semantic attributes and data-driven attributes. To learn semantic attributes from limited training data, we use an algorithm on multi-task convolutional neural networks to share visual knowledge among different semantic attribute categories. To discover data-driven attributes unsupervisedly, we propose an algorithm to simultaneously discover visual clusters and learn fashion-specific feature representations. Given attributes as representations, we propose to learn a ranking SPN (sum product networks) to rank pairs of fashion images. The proposed ranking SPN can capture the high-order correlations of the attributes. We show the effectiveness of our method on our two newly collected datasets.

[1]  Eric Gilbert,et al.  Faces engage us: photos with faces attract more likes and comments on Instagram , 2014, CHI.

[2]  Ali Farhadi,et al.  Describing objects by their attributes , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Yoshua Bengio,et al.  Shallow vs. Deep Sum-Product Networks , 2011, NIPS.

[4]  Gang Wang,et al.  Multi-Task CNN Model for Attribute Prediction , 2015, IEEE Transactions on Multimedia.

[5]  Changsheng Xu,et al.  Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Ian Davidson,et al.  Learning Multiple Relative Attributes With Humans in the Loop , 2014, IEEE Transactions on Image Processing.

[7]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Kristen Grauman,et al.  Sharing features between objects and their attributes , 2011, CVPR 2011.

[9]  Alexander C. Berg,et al.  Hipster Wars: Discovering Elements of Fashion Styles , 2014, ECCV.

[10]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[11]  Kristen Grauman,et al.  Decorrelating Semantic Visual Attributes by Resisting the Urge to Share , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Daniel Lowd,et al.  Learning Sum-Product Networks with Direct and Indirect Variable Interactions , 2014, ICML.

[13]  Vinod Nair,et al.  A joint learning framework for attribute models and object descriptions , 2011, 2011 International Conference on Computer Vision.

[14]  Ali Farhadi,et al.  Attribute Discovery via Predictable Discriminative Binary Codes , 2012, ECCV.

[15]  Adnan Darwiche,et al.  A differential approach to inference in Bayesian networks , 2000, JACM.

[16]  Gang Wang,et al.  Learning deep features for multiple object tracking by using a multi-task learning strategy , 2014, 2014 IEEE International Conference on Image Processing (ICIP).

[17]  Pedro M. Domingos,et al.  Discriminative Learning of Sum-Product Networks , 2012, NIPS.

[18]  Gang Wang,et al.  Modeling fashion , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[19]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Thore Graepel,et al.  Large Margin Rank Boundaries for Ordinal Regression , 2000 .

[21]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[22]  Gang Wang,et al.  Integrating parametric and non-parametric models for scene labeling , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Xiaogang Wang,et al.  A Deep Sum-Product Architecture for Robust Facial Attributes Analysis , 2013, 2013 IEEE International Conference on Computer Vision.

[24]  Jia Chen,et al.  Unified Structured Learning for Simultaneous Human Pose Estimation and Garment Attribute Classification , 2014, IEEE Transactions on Image Processing.

[25]  Shih-Fu Chang,et al.  Designing Category-Level Attributes for Discriminative Visual Recognition , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Qiang Zhou,et al.  Learning to Share Latent Tasks for Action Recognition , 2013, 2013 IEEE International Conference on Computer Vision.

[27]  Pedro M. Domingos,et al.  Learning the Structure of Sum-Product Networks , 2013, ICML.

[28]  Huizhong Chen,et al.  Describing Clothing by Semantic Attributes , 2012, ECCV.

[29]  Luis E. Ortiz,et al.  Parsing clothing in fashion photographs , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[31]  Pedro M. Domingos,et al.  Sum-product networks: A new deep architecture , 2011, 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops).

[32]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[33]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[34]  David A. McAllester,et al.  Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Baoxin Li,et al.  Max-Margin Multiattribute Learning With Low-Rank Constraint , 2014, IEEE Transactions on Image Processing.

[36]  Paul A. Viola,et al.  Robust Real-time Object Detection , 2001 .

[37]  Vicente Ordonez,et al.  High level describable attributes for predicting aesthetics and interestingness , 2011, CVPR 2011.

[38]  Trevor Darrell,et al.  DeCAF: A Deep Convolutional Activation Feature for Generic Visual Recognition , 2013, ICML.

[39]  Tsuhan Chen,et al.  Clothing cosegmentation for recognizing people , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Kristen Grauman,et al.  Relative attributes , 2011, 2011 International Conference on Computer Vision.

[41]  Matthew B. Blaschko,et al.  Taxonomic Multi-class Prediction and Person Layout Using Efficient Structured Ranking , 2012, ECCV.

[42]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[43]  Antonio Torralba,et al.  Understanding the Intrinsic Memorability of Images , 2011, NIPS.

[44]  Paul A. Viola,et al.  Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[45]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[46]  Tamara L. Berg,et al.  Paper Doll Parsing: Retrieving Similar Styles to Parse Clothing Items , 2013, 2013 IEEE International Conference on Computer Vision.