论文信息 - Multi-View Product Image Search with Deep ConvNets Representations

Multi-View Product Image Search with Deep ConvNets Representations

Multi-view product image queries can improve retrieval performance over single view queries significantly. In this paper, we investigated the performance of deep convolutional neural networks (ConvNets) on multi-view product image search. First, we trained a VGG-like network to learn deep ConvNets representations of product images. Then, we computed the deep ConvNets representations of database and query images and performed single view queries, and multi-view queries using several early and late fusion approaches. We performed extensive experiments on the publicly available Multi-View Object Image Dataset (MVOD 5K) with both clean background queries from the Internet and cluttered background queries from a mobile phone. We compared the performance of ConvNets to the classical bag-of-visual-words (BoWs). We concluded that (1) multi-view queries with deep ConvNets representations perform significantly better than single view queries, (2) ConvNets perform much better than BoWs and have room for further improvement, (3) pre-training of ConvNets on a different image dataset with background clutter is needed to obtain good performance on cluttered product image queries obtained with a mobile phone.

Muhammet Bastan | Özgür Yilmaz

[1] Ying Wu,et al. Mobile Product Image Search by Automatic Query Object Extraction , 2012, ECCV.

[2] Albert Gordo,et al. Deep Image Retrieval: Learning Global Representations for Image Search , 2016, ECCV.

[3] Subhransu Maji,et al. Multi-view Convolutional Neural Networks for 3D Shape Recognition , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[4] Jianmin Wang,et al. Deep Quantization Network for Efficient Image Retrieval , 2016, AAAI.

[5] Xueming Qian,et al. Mobile image retrieval using multi-photos as query , 2013, 2013 IEEE International Conference on Multimedia and Expo Workshops (ICMEW).

[6] Forrest N. Iandola,et al. SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[7] Özgür Ulusoy,et al. Mobile multi-view object image search , 2015, Multimedia Tools and Applications.

[8] Chu-Hui Lee,et al. A Multi-query Strategy forContent-based Image Retrieval , 2011 .

[9] Bernd Girod,et al. Mobile Visual Search , 2011, IEEE Signal Processing Magazine.

[10] 김종영. 구글 TensorFlow 소개 , 2015 .

[11] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[12] G. Griffin,et al. Caltech-256 Object Category Dataset , 2007 .

[13] Atsuto Maki,et al. From generic to specific deep representations for visual recognition , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[14] Bernd Girod,et al. Mobile Visual Search: Architectures, Technologies, and the Emerging MPEG Standard , 2011, IEEE MultiMedia.

[15] Luc Van Gool,et al. Multi-view traffic sign detection, recognition, and 3D localisation , 2014, Machine Vision and Applications.

[16] Muhammet Bastan,et al. Multi-view object detection in dual-energy X-ray images , 2015, Machine Vision and Applications.

[17] Cordelia Schmid,et al. Local Convolutional Features with Unsupervised Training for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[18] Yang Song,et al. Learning Fine-Grained Image Similarity with Deep Ranking , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[19] Victor S. Lempitsky,et al. Aggregating Deep Convolutional Features for Image Retrieval , 2015, ArXiv.

[20] Andrew Zisserman,et al. Multiple queries for large scale specific object retrieval , 2012, BMVC.

[21] Albert Gordo,et al. End-to-End Learning of Deep Visual Representations for Image Retrieval , 2016, International Journal of Computer Vision.

[22] Bernd Girod,et al. Memory-Efficient Image Databases for Mobile Visual Search , 2014, IEEE MultiMedia.