Multi-View Product Image Search with Deep ConvNets Representations

Multi-view product image queries can improve retrieval performance over single view queries significantly. In this paper, we investigated the performance of deep convolutional neural networks (ConvNets) on multi-view product image search. First, we trained a VGG-like network to learn deep ConvNets representations of product images. Then, we computed the deep ConvNets representations of database and query images and performed single view queries, and multi-view queries using several early and late fusion approaches. We performed extensive experiments on the publicly available Multi-View Object Image Dataset (MVOD 5K) with both clean background queries from the Internet and cluttered background queries from a mobile phone. We compared the performance of ConvNets to the classical bag-of-visual-words (BoWs). We concluded that (1) multi-view queries with deep ConvNets representations perform significantly better than single view queries, (2) ConvNets perform much better than BoWs and have room for further improvement, (3) pre-training of ConvNets on a different image dataset with background clutter is needed to obtain good performance on cluttered product image queries obtained with a mobile phone.

[1]  Ying Wu,et al.  Mobile Product Image Search by Automatic Query Object Extraction , 2012, ECCV.

[2]  Albert Gordo,et al.  Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.

[3]  Subhransu Maji,et al.  Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4]  Jianmin Wang,et al.  Deep Quantization Network for Efficient Image Retrieval , 2016, AAAI.

[5]  Xueming Qian,et al.  Mobile image retrieval using multi-photos as query , 2013, 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[6]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[7]  Özgür Ulusoy,et al.  Mobile multi-view object image search , 2015, Multimedia Tools and Applications.

[8]  Chu-Hui Lee,et al.  A Multi-query Strategy forContent-based Image Retrieval , 2011 .

[9]  Bernd Girod,et al.  Mobile Visual Search , 2011, IEEE Signal Processing Magazine.

[10]  김종영 구글 TensorFlow 소개 , 2015 .

[11]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[12]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[13]  Atsuto Maki,et al.  From generic to specific deep representations for visual recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14]  Bernd Girod,et al.  Mobile Visual Search: Architectures, Technologies, and the Emerging MPEG Standard , 2011, IEEE MultiMedia.

[15]  Luc Van Gool,et al.  Multi-view traffic sign detection, recognition, and 3D localisation , 2014, Machine Vision and Applications.

[16]  Muhammet Bastan,et al.  Multi-view object detection in dual-energy X-ray images , 2015, Machine Vision and Applications.

[17]  Cordelia Schmid,et al.  Local Convolutional Features with Unsupervised Training for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18]  Yang Song,et al.  Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Victor S. Lempitsky,et al.  Aggregating Deep Convolutional Features for Image Retrieval , 2015, ArXiv.

[20]  Andrew Zisserman,et al.  Multiple queries for large scale specific object retrieval , 2012, BMVC.

[21]  Albert Gordo,et al.  End-to-End Learning of Deep Visual Representations for Image Retrieval , 2016, International Journal of Computer Vision.

[22]  Bernd Girod,et al.  Memory-Efficient Image Databases for Mobile Visual Search , 2014, IEEE MultiMedia.