Multiple queries for large scale specific object retrieval

The aim of large scale specific-object image retrieval systems is to instanta neously find images that contain the query object in the image database. Current s ystems, for example Google Goggles, concentrate on querying using a single view of an object, e.g. a photo a user takes with his mobile phone, in order to answer the question “what is this?”. Here we consider the somewhat converse problem of finding all images of an object given that the user knows what he is looking for; so the input modality is text, not a n image. This problem is useful in a number of settings, for example media production teams are interested in searching internal databases for images or video footage to accompany news reports and newspaper articles. Given a textual query (e.g. “coca cola bottle”), our approach is to firs t obtain multiple images of the queried object using textual Google image search. These images are then used to visually query the target database to discover images containing the object of interest. We compare a number of different methods for combining the multiple query images, including discriminative learning. We show that issuing multiple queries significantly improves recall and enables the system to find quite challenging occu rrences of the queried object. The system is evaluated quantitatively on the standard Oxford Buildings benchmark dataset where it achieves very high retrieval performance, and also qualitatively on the TrecVid 2011 known-item search dataset.

[1]  Ingemar J. Cox,et al.  The Bayesian image retrieval system, PicHunter: theory, implementation, and psychophysical experiments , 2000, IEEE Trans. Image Process..

[2]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[3]  Alexei A. Efros,et al.  Ensemble of exemplar-SVMs for object detection and beyond , 2011, 2011 International Conference on Computer Vision.

[4]  Jiri Matas,et al.  Total recall II: Query expansion revisited , 2011, CVPR 2011.

[5]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Ameesh Makadia,et al.  Feature Tracking for Wide-Baseline Image Retrieval , 2010, ECCV.

[7]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[8]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[9]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[10]  Hervé Jégou,et al.  Asymmetric Hamming Embedding , 2011, MM 2011.

[11]  Javed A. Aslam,et al.  Models for metasearch , 2001, SIGIR '01.

[12]  Jiri Matas,et al.  Learning a Fine Vocabulary , 2010, ECCV.

[13]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Panu Turcot,et al.  Better matching with fewer features: The selection of useful features in large database recognition problems , 2009, 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops.

[16]  Jiri Matas,et al.  Efficient representation of local geometry for large scale object retrieval , 2009, CVPR.

[17]  Paul Over,et al.  TRECVID 2008 - Goals, Tasks, Data, Evaluation Mechanisms and Metrics , 2010, TRECVID.

[18]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[19]  JegouHerve,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010 .

[20]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[21]  Andrew W. Fitzgibbon,et al.  Efficient Object Category Recognition Using Classemes , 2010, ECCV.

[22]  Luc Van Gool,et al.  Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors , 2011, CVPR 2011.

[23]  Katherine A. Heller,et al.  A Simple Bayesian Framework for Content-Based Image Retrieval , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[24]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[25]  Pietro Perona,et al.  Learning object categories from Google's image search , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[26]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[27]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[28]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).