Towards Large-scale Image Retrieval with a Disk-only Index

Facing ever-growing image databases, the focus of research in content-based image retrieval, where a query image is used to search for those images in a large database that show the same object or scene, has shifted in the last decade. Instead of using local features such as SIFT together with quantization and inverted file indexing schemes, models working with global features and exhaustive search have been proposed to encounter limited main memory and increasing query times. This, however, impairs the capability to find small objects in images with cluttered background. In this paper, we argue, that it is worth reconsidering image retrieval with local features because since then, two crucial ingredients became available: large solid-state disks providing dramatically shorter access times, and more discriminative models enhancing the local features, for example, by encoding their spatial neighborhood using features from convolutional neural networks resulting in way fewer random read memory accesses. We show that properly combining both insights renders it possible to keep the index of the database images on the disk rather than in the main memory which allows even larger databases on today’s hardware. As proof of concept we support our arguments with experiments on established public datasets for large-scale image retrieval.

[1]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[2]  Bart Thomee,et al.  New trends and ideas in visual concept detection: the MIR flickr retrieval evaluation initiative , 2010, MIR '10.

[3]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[4]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[5]  Qi Tian,et al.  Packing and Padding: Coupled Multi-index for Accurate Image Retrieval , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[7]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Shiliang Zhang,et al.  Multi-order visual phrase for scalable image search , 2013, ICIMCS '13.

[9]  Ronan Sicre,et al.  Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[10]  Victor S. Lempitsky,et al.  Aggregating Local Deep Features for Image Retrieval , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11]  Simon Osindero,et al.  Cross-Dimensional Weighting for Aggregated Deep Convolutional Features , 2015, ECCV Workshops.

[12]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Fahad Shahbaz Khan,et al.  Color attributes for object detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  Atsuto Maki,et al.  A Baseline for Visual Instance Retrieval with Deep Convolutional Networks , 2014, ICLR 2015.