Key factors for large scale visual vocabulary

The visual vocabulary, which is the key component of Bag-of-Words(BoW) model, plays an important role for representing visual content in both effectiveness and efficiency. Although various construction methods have been proposed in previous work, less effects have been paid on discovering the key factors that impacts the performance of visual vocabulary, especially in the case of building large scale visual vocabulary. In this paper, we systematically investigate the performance change in descriptor matching level when adapting different visual vocabulary schemes. Then, we will deduce some useful observations according to performance analysis, which can be used to design more effective and fast image search engine. To certify the correctness of these observations, we develop a BoW-based image search engine by following the observations. The comprehensive experiments show the search performance in both effectiveness and efficiency.

[1]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[2]  Frédéric Jurie,et al.  Fast Discriminative Visual Codebooks using Randomized Clustering Forests , 2006, NIPS.

[3]  Cordelia Schmid,et al.  Vector Quantizing Feature Space with a Regular Lattice , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[4]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[5]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.