Memory-Augmented Attribute Manipulation Networks for Interactive Fashion Search

We introduce a new fashion search protocol where attribute manipulation is allowed within the interaction between users and search engines, e.g. manipulating the color attribute of the clothing from red to blue. It is particularly useful for image-based search when the query image cannot perfectly match users expectation of the desired product. To build such a search engine, we propose a novel memory-augmented Attribute Manipulation Network (AMNet) which can manipulate image representation at the attribute level. Given a query image and some attributes that need to modify, AMNet can manipulate the intermediate representation encoding the unwanted attributes and change them to the desired ones through following four novel components: (1) a dual-path CNN architecture for discriminative deep attribute representation learning, (2) a memory block with an internal memory and a neural controller for prototype attribute representation learning and hosting, (3) an attribute manipulation network to modify the representation of the query image with the prototype feature retrieved from the memory block, (4) a loss layer which jointly optimizes the attribute classification loss and a triplet ranking loss over triplet images for facilitating precise attribute manipulation and image retrieving. Extensive experiments conducted on two large-scale fashion search datasets, i.e. DARN and DeepFashion, have demonstrated that AMNet is able to achieve remarkably good performance compared with well-designed baselines in terms of effectiveness of attribute manipulation and search accuracy.

[1]  Shuicheng Yan,et al.  Clothes Co-Parsing Via Joint Image Segmentation and Labeling With Application to Clothing Retrieval , 2016, IEEE Transactions on Multimedia.

[2]  Takayuki Okatani,et al.  Automatic Attribute Discovery with Neural Activations , 2016, ECCV.

[3]  Adriana Kovashka,et al.  Attribute Adaptation for Personalized Image Search , 2013, 2013 IEEE International Conference on Computer Vision.

[4]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[5]  Svetlana Lazebnik,et al.  Where to Buy It: Matching Street Clothing Photos in Online Shops , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[6]  Luc Van Gool,et al.  Apparel Classification with Style , 2012, ACCV.

[7]  Yang Liu,et al.  Video eCommerce: Towards Online Video Advertising , 2016, ACM Multimedia.

[8]  Feng Liu,et al.  Sketch Me That Shoe , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Adriana Kovashka,et al.  WhittleSearch: Image search with relative attribute feedback , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[10]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[11]  Subhransu Maji,et al.  Describing people: A poselet-based approach to attribute classification , 2011, 2011 International Conference on Computer Vision.

[12]  Mei-Chen Yeh,et al.  Fast Human Detection Using a Cascade of Histograms of Oriented Gradients , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Jian Dong,et al.  Deep domain adaptation for describing people based on fine-grained clothing attributes , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Bernard Ghanem,et al.  On the relationship between visual attributes and convolutional networks , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15]  Donghoon Lee,et al.  Deep Attribute Networks , 2012, ArXiv.

[16]  Geoffrey E. Hinton,et al.  Zero-shot Learning with Semantic Output Codes , 2009, NIPS.

[17]  Alex Graves,et al.  Neural Turing Machines , 2014, ArXiv.

[18]  Yann LeCun,et al.  Signature Verification Using A "Siamese" Time Delay Neural Network , 1993, Int. J. Pattern Recognit. Artif. Intell..

[19]  Serge J. Belongie,et al.  Learning Visual Clothing Style with Heterogeneous Dyadic Co-Occurrences , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[20]  Christoph H. Lampert,et al.  Learning to detect unseen object classes by between-class attribute transfer , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Alexei A. Efros,et al.  Generative Visual Manipulation on the Natural Image Manifold , 2016, ECCV.

[22]  Kristen Grauman,et al.  Relative attributes , 2011, 2011 International Conference on Computer Vision.

[23]  Xiaogang Wang,et al.  DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[25]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[26]  Larry S. Davis,et al.  Image ranking and retrieval based on multi-attribute queries , 2011, CVPR 2011.

[27]  Qiang Chen,et al.  Cross-Domain Image Retrieval with a Dual Attribute-Aware Ranking Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[28]  Kun Duan,et al.  Discovering localized attributes for fine-grained recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Trevor Darrell,et al.  PANDA: Pose Aligned Networks for Deep Attribute Modeling , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Changsheng Xu,et al.  Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Hanqing Lu,et al.  Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Roberto Cipolla,et al.  DEEP-CARVING: Discovering visual attributes by carving deep neural nets , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[34]  Yong Jae Lee,et al.  End-to-End Localization and Ranking for Relative Attributes , 2016, ECCV.

[35]  Yannis Kalantidis,et al.  Getting the look: clothing recognition and segmentation for automatic product suggestions in everyday photos , 2013, ICMR.

[36]  Shree K. Nayar,et al.  Attribute and simile classifiers for face verification , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[37]  Huizhong Chen,et al.  Describing Clothing by Semantic Attributes , 2012, ECCV.

[38]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.