Exploring Spatial Correlation for Visual Object Retrieval

Bag-of-visual-words (BOVW)-based image representation has received intense attention in recent years and has improved content-based image retrieval (CBIR) significantly. BOVW does not consider the spatial correlation between visual words in natural images and thus biases the generated visual words toward noise when the corresponding visual features are not stable. This article outlines the construction of a visual word co-occurrence matrix by exploring visual word co-occurrence extracted from small affine-invariant regions in a large collection of natural images. Based on this co-occurrence matrix, we first present a novel high-order predictor to accelerate the generation of spatially correlated visual words and a penalty tree (PTree) to continue generating the words after the prediction. Subsequently, we propose two methods of co-occurrence weighting similarity measure for image ranking: Co-Cosine and Co-TFIDF. These two new schemes down-weight the contributions of the words that are less discriminative because of frequent co-occurrences with other words. We conduct experiments on Oxford and Paris Building datasets, in which the ImageNet dataset is used to implement a large-scale evaluation. Cross-dataset evaluations between the Oxford and Paris datasets and Oxford and Holidays datasets are also provided. Thorough experimental results suggest that our method outperforms the state of the art without adding much additional cost to the BOVW model.

[1]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[2]  David G. Lowe,et al.  Shape indexing using approximate nearest-neighbour search in high-dimensional spaces , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[3]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[4]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[5]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[6]  S. Shekhar,et al.  Discovering Co-location Patterns from Spatial Datasets : A General Approach , 2004 .

[7]  Andrew W. Moore,et al.  An Investigation of Practical Approximate Nearest Neighbor Algorithms , 2004, NIPS.

[8]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[9]  Hui Xiong,et al.  Discovering colocation patterns from spatial data sets: a general approach , 2004, IEEE Transactions on Knowledge and Data Engineering.

[10]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Xuelong Li,et al.  Direct kernel biased discriminant analysis: a new content-based image retrieval relevance feedback algorithm , 2006, IEEE Transactions on Multimedia.

[12]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[13]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Xuelong Li,et al.  Negative Samples Analysis in Relevance Feedback , 2007, IEEE Transactions on Knowledge and Data Engineering.

[15]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[16]  Richard I. Hartley,et al.  Optimised KD-trees for fast image descriptor matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Hui Xiong,et al.  Under Consideration for Publication in Knowledge and Information Systems Object Discovery in High-resolution Remote Sensing Images: a Semantic Perspective , 2022 .

[19]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[20]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[21]  C. Schmid,et al.  On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Xuelong Li,et al.  Geometric Mean for Subspace Selection , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Gang Hua,et al.  Descriptive visual words and visual phrases for image applications , 2009, ACM Multimedia.

[24]  Gang Hua,et al.  Building contextual visual vocabulary for large-scale image applications , 2010, ACM Multimedia.

[25]  Cordelia Schmid,et al.  Accurate Image Search Using the Contextual Dissimilarity Measure , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Xian-Sheng Hua,et al.  Contextual image retrieval model , 2010, CIVR '10.

[27]  Zheng-Jun Zha,et al.  Query expansion by spatial co-occurrence for image retrieval , 2011, MM '11.

[28]  Meng Wang,et al.  Active learning in multimedia annotation and retrieval: A survey , 2011, TIST.

[29]  Luc Van Gool,et al.  Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors , 2011, CVPR 2011.

[30]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Bo Geng,et al.  Fast visual word quantization via spatial neighborhood boosting , 2011, 2011 IEEE International Conference on Multimedia and Expo.

[32]  Xian-Sheng Hua,et al.  Interactive Image Search by Color Map , 2011, TIST.

[33]  Gang Hua,et al.  Generating Descriptive Visual Words and Visual Phrases for Large-Scale Image Applications , 2011, IEEE Transactions on Image Processing.

[34]  Zhiwei Li,et al.  Contextual synonym dictionary for visual object retrieval , 2011, ACM Multimedia.

[35]  Tsuhan Chen,et al.  Image retrieval with geometry-preserving visual phrases , 2011, CVPR 2011.

[36]  C. Schmid,et al.  Exploiting descriptor distances for precise image search , 2011 .

[37]  Ming Yang,et al.  Contextual weighting for vocabulary tree based image retrieval , 2011, 2011 International Conference on Computer Vision.

[38]  Dacheng Tao,et al.  Exploiting visual word co-occurrence for image retrieval , 2012, ACM Multimedia.

[39]  Ying Wu,et al.  Object retrieval and localization with spatially-constrained similarity measure and k-NN re-ranking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Hervé Jégou,et al.  Negative Evidences and Co-occurences in Image Retrieval: The Benefit of PCA and Whitening , 2012, ECCV.

[41]  Dacheng Tao,et al.  W-Tree Indexing for Fast Visual Word Generation , 2013, IEEE Transactions on Image Processing.

[42]  Hai Yang,et al.  ACM Transactions on Intelligent Systems and Technology - Special Section on Urban Computing , 2014 .

[43]  许超 Exploring Spatial Correlation for Visual Object Retrieval , 2015 .