Visual word expansion and BSIFT verification for large-scale image search

Recently, great advance has been made in large-scale content-based image search. Most state-of-the-art approaches are based on the bag-of-visual-words model with local features, such as SIFT, for image representation. Visual matching between images is obtained by vector quantization of local features. Feature quantization is either performed with hierarchical k-NN which introduces severe quantization loss, or with ANN (approximate nearest neighbors) search such as k-d tree, which is computationally inefficient. Besides, feature matching by quantization ignores the vector distance between features, which may cause many false-positive matches. In this paper, we propose constructing a supporting visual word table for all visual words by visual word expansion. Given the initial quantization result, multiple approximate nearest visual words are identified by checking supporting visual word table, which benefits the retrieval recall. Moreover, we present a matching verification scheme based on binary SIFT (BSIFT) signature. The L2 distance between original SIFT descriptors is demonstrated to be well kept with the metric of Hamming distance between the corresponding binary SIFT signatures. With the BSIFT verification, false-positive matches can be effectively and efficiently identified and removed, which greatly improves the precision of large-scale image search. We evaluate the proposed approach on two public datasets for large-scale image search. The experimental results demonstrate the effectiveness and efficiency of our scheme.

[1]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[2]  Michael Isard,et al.  General Theory , 1969 .

[3]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[4]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[5]  Winston H. Hsu,et al.  Query expansion for hash-based image object retrieval , 2009, ACM Multimedia.

[6]  Cordelia Schmid,et al.  Vector Quantizing Feature Space with a Regular Lattice , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[7]  Qingming Huang,et al.  Learning Hierarchical Semantic Description Via Mixed-Norm Regularization for Image Understanding , 2012, IEEE Transactions on Multimedia.

[8]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Andrew Zisserman,et al.  Near Duplicate Image Detection: min-Hash and tf-idf Weighting , 2008, BMVC.

[10]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[11]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[13]  Robin Hess An Open-Source SIFT Library , 2010 .

[14]  Gang Hua,et al.  Building contextual visual vocabulary for large-scale image applications , 2010, ACM Multimedia.

[15]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[16]  Jiri Matas,et al.  Geometric min-Hashing: Finding a (thick) needle in a haystack , 2009, CVPR.

[17]  Rob Hess,et al.  An open-source SIFTLibrary , 2010, ACM Multimedia.

[18]  Qi Tian,et al.  Spatial coding for large scale partial-duplicate web image search , 2010, ACM Multimedia.

[19]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[20]  Sunil Arya,et al.  ANN: library for approximate nearest neighbor searching , 1998 .

[21]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[22]  Gang Hua,et al.  Generating Descriptive Visual Words and Visual Phrases for Large-Scale Image Applications , 2011, IEEE Transactions on Image Processing.

[23]  Patrick Gros,et al.  Asymmetric hamming embedding: taking the best of our bits for large scale image search , 2011, ACM Multimedia.

[24]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[25]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[26]  Luc Van Gool,et al.  SURF: Speeded Up Robust Features , 2006, ECCV.

[27]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[28]  Qi Tian,et al.  Scalar quantization for large scale image search , 2012, ACM Multimedia.

[29]  Qi Tian,et al.  Binary SIFT: towards efficient feature matching verification for image search , 2012, ICIMCS '12.

[30]  Qi Tian,et al.  Large scale image search with geometric coding , 2011, ACM Multimedia.