Internet-Based Image Retrieval Using End-to-End Trained Deep Distributions

Internet image search engines have long been considered as a promising tool for handling open-vocabulary textual user queries to unannotated image datasets. However, systems that use this tool have to deal with multi-modal and noisy image sets returned by search engines, especially for polysemous queries. Generally, for many queries, only a small part of the returned sets can be relevant to the user intent. In this work, we suggest an approach that explicitly accounts for the complex and noisy structure of the image sets returned by Internet image search engines. Similarly to a considerable number of previous image retrieval works, we train a deep convolutional network that maps images to high-dimensional descriptors. To model image sets obtained from the Internet, our approach then fits a simple probabilistic model that accounts for multi-modality and noise (e.g. a Gaussian mixture model) to the deep descriptors of the images in this set. Finally, the resulting distribution model can be used to search in the unannotated image dataset by evaluating likelihoods of individual images. As our main contribution, we develop an end-to-end training procedure that tunes the parameters of a deep network using an annotated training set, while accounting for the distribution fitting and the subsequent matching. In the experiments, we show that such an end-to-end approach boosts the accuracy of the Internet-based image retrieval for hold-out concepts, as compared to retrieval systems that fit similar distribution models to pre-trained features and to simpler end-to-end trained baselines.

[1]  Victor S. Lempitsky,et al.  Aggregating Deep Convolutional Features for Image Retrieval , 2015, ArXiv.

[2]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[3]  Andrew Zisserman,et al.  Multiple queries for large scale specific object retrieval , 2012, BMVC.

[4]  Andrew Zisserman,et al.  Automated Flower Classification over a Large Number of Classes , 2008, 2008 Sixth Indian Conference on Computer Vision, Graphics & Image Processing.

[5]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[6]  Andrew Zisserman,et al.  VISOR: Towards On-the-Fly Large-Scale Object Category Retrieval , 2012, ACCV.

[7]  M Tenorth,et al.  Web-Enabled Robots , 2011, IEEE Robotics & Automation Magazine.

[8]  Andrew Zisserman,et al.  On-the-fly learning for visual search of large-scale image and video datasets , 2015, International Journal of Multimedia Information Retrieval.

[9]  Victor S. Lempitsky,et al.  Neural Codes for Image Retrieval , 2014, ECCV.

[10]  Antonio Criminisi,et al.  Harvesting Image Databases from the Web , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[11]  Vidit Jain,et al.  Learning to re-rank: query-dependent image re-ranking using click data , 2011, WWW.

[12]  Tinne Tuytelaars,et al.  Mining Multiple Queries for Image Retrieval: On-the-Fly Learning of an Object-Specific Mid-level Representation , 2013, 2013 IEEE International Conference on Computer Vision.

[13]  Pietro Perona,et al.  A Visual Category Filter for Google Images , 2004, ECCV.

[14]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[15]  Trevor Darrell,et al.  Open-vocabulary Object Retrieval , 2014, Robotics: Science and Systems.

[16]  Atsuto Maki,et al.  Visual Instance Retrieval with Deep Convolutional Networks , 2014, ICLR.

[17]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[18]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[19]  Victor S. Lempitsky,et al.  Learning Deep Embeddings with Histogram Loss , 2016, NIPS.

[20]  Neeraj Kumar,et al.  Photo Recall: Using the Internet to Label Your Photos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[21]  Alan Hanjalic,et al.  Supervised reranking for web image search , 2010, ACM Multimedia.

[22]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[23]  Razvan Pascanu,et al.  Theano: A CPU and GPU Math Compiler in Python , 2010, SciPy.

[24]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[25]  Ondrej Chum,et al.  CNN Image Retrieval Learns from BoW: Unsupervised Fine-Tuning with Hard Examples , 2016, ECCV.

[26]  Gabriela Csurka,et al.  Distance-Based Image Classification: Generalizing to New Classes at Near-Zero Cost , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[27]  Dieter Fox,et al.  Object Recognition in 3D Point Clouds Using Web Data and Domain Adaptation , 2010, Int. J. Robotics Res..