Efficient clustering and quantisation of SIFT features: exploiting characteristics of the SIFT descriptor and interest region detectors under image inversion

The SIFT keypoint descriptor is a powerful approach to encoding local image description using edge orientation histograms. Through codebook construction via k-means clustering and quantisation of SIFT features we can achieve image retrieval treating images as bags-of-words. Intensity inversion of images results in distinct SIFT features for a single local image patch across the two images. Intensity inversions notwithstanding these two patches are structurally identical. Through careful reordering of the SIFT feature vectors, we can construct the SIFT feature that would have been generated from a non-inverted image patch starting with those extracted from an inverted image patch. Furthermore, through examination of the local feature detection stage, we can estimate whether a given SIFT feature belongs in the space of inverted features, or non-inverted features. Therefore we can consistently separate the space of SIFT features into two distinct subspaces. With this knowledge, we can demonstrate reduced time complexity of codebook construction via clustering by up to a factor of four and also reduce the memory consumption of the clustering algorithms while producing equivalent retrieval results.

[1]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[2]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[4]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[5]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[6]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[7]  Jonathon S. Hare,et al.  Automatically annotating the MIR Flickr dataset: experimental protocols, openly available data and semantic spaces , 2010, MIR '10.

[8]  Qi Tian,et al.  Spatial coding for large scale partial-duplicate web image search , 2010, ACM Multimedia.

[9]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[10]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Jonathon S. Hare,et al.  Content-based image retrieval using a mobile device as a novel interface , 2005, IS&T/SPIE Electronic Imaging.

[12]  Alexei A. Efros,et al.  Discovering objects and their location in images , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[13]  Rui Ma,et al.  MI-SIFT: mirror and inversion invariant generalization for SIFT descriptor , 2010, CIVR '10.

[14]  M.S. Nixon,et al.  Robust 2D Ear Registration and Recognition Based on SIFT Point Matching , 2008, 2008 IEEE Second International Conference on Biometrics: Theory, Applications and Systems.

[15]  Jonathon S. Hare,et al.  On Image Retrieval Using Salient Regions with Vector-Spaces and Latent Semantics , 2005, CIVR.