Object retrieval with large vocabularies and fast spatial matching

In this paper, we present a large-scale object retrieval system. The user supplies a query object by selecting a region of a query image, and the system returns a ranked list of images that contain the same object, retrieved from a large corpus. We demonstrate the scalability and performance of our system on a dataset of over 1 million images crawled from the photo-sharing site, Flickr [3], using Oxford landmarks as queries. Building an image-feature vocabulary is a major time and performance bottleneck, due to the size of our dataset. To address this problem we compare different scalable methods for building a vocabulary and introduce a novel quantization method based on randomized trees which we show outperforms the current state-of-the-art on an extensive ground-truth. Our experiments show that the quantization has a major effect on retrieval quality. To further improve query performance, we add an efficient spatial verification stage to re-rank the results returned from our bag-of-words model and show that this consistently improves search quality, though by less of a margin when the visual vocabulary is large. We view this work as a promising step towards much larger, "web-scale " image corpora.

[1]  A FischlerMartin,et al.  Random sample consensus , 1981 .

[2]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[3]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[4]  Yali Amit,et al.  Shape Quantization and Recognition with Randomized Trees , 1997, Neural Computation.

[5]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[6]  Bernhard P. Wrobel,et al.  Multiple View Geometry in Computer Vision , 2001 .

[7]  O. Chum,et al.  ENHANCING RANSAC BY GENERALIZED MODEL OPTIMIZATION Onďrej Chum, Jǐ , 2003 .

[8]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  Charles Elkan,et al.  Using the Triangle Inequality to Accelerate k-Means , 2003, ICML.

[10]  Knut Magne Risvik,et al.  Multi-tier architecture for Web search engines , 2003, Proceedings of the IEEE/LEOS 3rd International Conference on Numerical Simulation of Semiconductor Optoelectronic Devices (IEEE Cat. No.03EX726).

[11]  Luiz André Barroso,et al.  Web Search for a Planet: The Google Cluster Architecture , 2003, IEEE Micro.

[12]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[13]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[14]  Vincent Lepetit,et al.  Randomized trees for real-time keypoint recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[15]  Stepán Obdrzálek,et al.  Sub-linear Indexing for Large Scale Object Recognition , 2005, BMVC.

[16]  Luc Van Gool,et al.  Edinburgh Research Explorer Simultaneous Object Recognition and Segmentation by Image Exploration , 2022 .

[17]  Antonio Criminisi,et al.  Object Class Recognition at a Glance , 2006 .

[18]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[19]  Bernt Schiele,et al.  Multiple Object Class Detection with a Generative Model , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).