Joint Optimization Toward Effective and Efficient Image Search

The bag-of-words (BoW) model has been known as an effective method for large-scale image search and indexing. Recent work shows that the performance of the model can be further improved by using the embedding method. While different variants of the BoW model and embedding method have been developed, less effort has been made to discover their underlying working mechanism. In this paper, we systematically investigate the image search performance variation with respect to a few factors of the BoW model, and study how to employ the embedding method to further improve the image search performance. Subsequently, we summarize several observations based on the experiments on descriptor matching. To validate these observations in a real image search, we propose an effective and efficient image search scheme, in which the BoW model and embedding method are jointly optimized in terms of effectiveness and efficiency by following these observations. Our comprehensive experiments demonstrate that it is beneficial to employ these observations to develop an image search algorithm, and the proposed image search scheme outperforms state-of-the-art methods in both effectiveness and efficiency.

[1]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[2]  George Kollios,et al.  BoostMap: An Embedding Method for Efficient Nearest Neighbor Retrieval , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Trevor Darrell,et al.  The pyramid match kernel: discriminative classification with sets of image features , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[4]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Frédéric Jurie,et al.  Sampling Strategies for Bag-of-Features Image Classification , 2006, ECCV.

[6]  Loong Fah Cheong,et al.  Randomized Locality Sensitive Vocabularies for Bag-of-Features Model , 2010, ECCV.

[7]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Patrick Gros,et al.  Asymmetric hamming embedding: taking the best of our bits for large scale image search , 2011, ACM Multimedia.

[10]  C. Schmid,et al.  On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Xuelong Li,et al.  Biologically Inspired Features for Scene Classification in Video Surveillance , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[12]  Giovanni Motta,et al.  Partitioned vector quantization: application to lossless compression of hyperspectral images , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[13]  Francesco G. B. De Natale,et al.  A Stochastic Approach to Image Retrieval Using Relevance Feedback and Particle Swarm Optimization , 2010, IEEE Transactions on Multimedia.

[14]  Hervé Jégou,et al.  Asymmetric Hamming Embedding , 2011, MM 2011.

[15]  Gang Hua,et al.  Building contextual visual vocabulary for large-scale image applications , 2010, ACM Multimedia.

[16]  Xing Xie,et al.  Coherent Phrase Model for Efficient Image Near-Duplicate Retrieval , 2009, IEEE Transactions on Multimedia.

[17]  Cordelia Schmid,et al.  Improving Bag-of-Features for Large Scale Image Search , 2010, International Journal of Computer Vision.

[18]  Cordelia Schmid,et al.  Vector Quantizing Feature Space with a Regular Lattice , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[19]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[20]  Hanan Samet,et al.  Properties of Embedding Methods for Similarity Searching in Metric Spaces , 2003, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Yao Zhao,et al.  Frame Fusion for Video Copy Detection , 2011, IEEE Transactions on Circuits and Systems for Video Technology.

[24]  Zhiwu Lu,et al.  Spatial Markov Kernels for Image Categorization and Annotation , 2011, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[25]  Yaswanth Kumar Avulapati,et al.  MULTIMODAL FUSION FOR VIDEO SEARCH RERANKING , 2013 .

[26]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[29]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[30]  Frédéric Jurie,et al.  Creating efficient codebooks for visual recognition , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.