Learning Visual Features from Product Title for Image Retrieval

There is a huge market demand for searching for products by images in e-commerce sites. Visual features play the most important role in solving this content-based image retrieval task. Most existing methods leverage pre-trained models on other large-scale datasets with well-annotated labels, e.g. the ImageNet dataset, to extract visual features. However, due to the large difference between the product images and the images in ImageNet, the feature extractor trained on ImageNet is not efficient in extracting the visual features of product images. And retraining the feature extractor on the product images is faced with the dilemma of lacking the annotated labels. In this paper, we utilize the easily accessible text information, that is, the product title, as a supervised signal to learn the features of the product image. Specifically, we use the n-grams extracted from the product title as the label of the product image to construct a dataset for image classification. This dataset is then used to fine-tuned a pre-trained model. Finally, the basic max-pooling activation of convolutions (MAC) feature is extracted from the fine-tuned model. As a result, we achieve the fourth position in the Grand Challenge of AI Meets Beauty in 2020 ACM Multimedia by using only a single ResNet-50 model without any human annotations and pre-processing or post-processing tricks. Our code is available at: \urlhttps://github.com/FangxiangFeng/AI-Meets-Beauty-2020.

[1]  Andrew Zisserman,et al.  Image Classification using Random Forests and Ferns , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[2]  Chee Seng Chan,et al.  Unprecedented Usage of Pre-trained CNNs on Beauty Product , 2018, ACM Multimedia.

[3]  Klaus Jung,et al.  Deep Aggregation of Regional Convolutional Activations for Content Based Image Retrieval , 2019, 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP).

[4]  Enhua Wu,et al.  Squeeze-and-Excitation Networks , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Haoran Xie,et al.  Cross-domain Beauty Item Retrieval via Unsupervised Embedding Learning , 2019, ACM Multimedia.

[6]  Lingyun Yu,et al.  Beauty Product Retrieval Based on Regional Maximum Activation of Convolutions with Generalized Attention , 2019, ACM Multimedia.

[7]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[8]  Victor S. Lempitsky,et al.  Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[9]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[10]  Abbes Amira,et al.  Content-based image retrieval with compact deep convolutional features , 2017, Neurocomputing.

[11]  Ronan Sicre,et al.  Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[12]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Yulong Xu,et al.  MS-RMAC: Multiscale Regional Maximum Activation of Convolutions for Image Retrieval , 2017, IEEE Signal Processing Letters.

[15]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[16]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[17]  Zhenguo Yang,et al.  Regional Maximum Activations of Convolutions with Attention for Cross-domain Beauty and Personal Care Product Retrieval , 2018, ACM Multimedia.

[18]  Kilian Q. Weinberger,et al.  Densely Connected Convolutional Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Yi Zhang,et al.  Beauty Aware Network: An Unsupervised Method for Makeup Product Retrieval , 2019, ACM Multimedia.

[20]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Kai Xu,et al.  Beauty Product Image Retrieval Based on Multi-Feature Fusion and Feature Aggregation , 2018, ACM Multimedia.

[23]  Qi Tian,et al.  Cascaded Feature Augmentation with Diffusion for Image Retrieval , 2018, ACM Multimedia.