A kernel density based approach for large scale image retrieval

Local image features, such as SIFT descriptors, have been shown to be effective for content-based image retrieval (CBIR). In order to achieve efficient image retrieval using local features, most existing approaches represent an image by a bag-of-words model in which every local feature is quantized into a visual word. Given the bag-of-words representation for images, a text search engine is then used to efficiently find the matched images for a given query. The main drawback with these approaches is that the two key steps, i.e., key point quantization and image matching, are separated, leading to sub-optimal performance in image retrieval. In this work, we present a statistical framework for large-scale image retrieval that unifies key point quantization and image matching by introducing kernel density function. The key ideas of the proposed framework are (a) each image is represented by a kernel density function from which the observed key points are sampled, and (b) the similarity of a gallery image to a query image is estimated as the likelihood of generating the key points in the query image by the kernel density function of the gallery image. We present efficient algorithms for kernel density estimation as well as for effective image matching. Experiments with large-scale image retrieval confirm that the proposed method is not only more effective but also more efficient than the state-of-the-art approaches in identifying visually similar images for given queries from large image databases.

[1]  Michael I. Jordan,et al.  Learning Multiscale Representations of Natural Scenes Using Dirichlet Processes , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[2]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[3]  Richard Hartley,et al.  Localisation using an image-map , 2004 .

[4]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[5]  Antonio Criminisi,et al.  Object categorization by learned universal visual dictionary , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[6]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[7]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[8]  Yan Ke,et al.  Efficient Near-duplicate Detection and Sub-image Retrieval , 2004 .

[9]  Cor J. Veenman,et al.  Kernel Codebooks for Scene Categorization , 2008, ECCV.

[10]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, CVPR.

[11]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[12]  Richard I. Hartley,et al.  Optimised KD-trees for fast image descriptor matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Yan Ke,et al.  An efficient parts-based near-duplicate and sub-image retrieval system , 2004, MULTIMEDIA '04.

[14]  H Moon,et al.  Computational and Performance Aspects of PCA-Based Face-Recognition Algorithms , 2001, Perception.

[15]  Andrew W. Moore,et al.  An Investigation of Practical Approximate Nearest Neighbor Algorithms , 2004, NIPS.

[16]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[18]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[20]  Jiri Matas,et al.  Efficient representation of local geometry for large scale object retrieval , 2009, CVPR.

[21]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[22]  Nuno Vasconcelos,et al.  A Kullback-Leibler Divergence Based Kernel for SVM Classification in Multimedia Applications , 2003, NIPS.

[23]  Stephen E. Robertson,et al.  Okapi at TREC-7: Automatic Ad Hoc, Filtering, VLC and Interactive , 1998, TREC.

[24]  Tony Jebara,et al.  A Kernel Between Sets of Vectors , 2003, ICML.

[25]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[26]  Trevor Darrell,et al.  Nearest-Neighbor Methods in Learning and Vision: Theory and Practice (Neural Information Processing) , 2006 .

[27]  Cordelia Schmid,et al.  A sparse texture representation using affine-invariant regions , 2003, 2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003. Proceedings..

[28]  Vincent Lepetit,et al.  Randomized trees for real-time keypoint recognition , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).