Bayes pooling of visual phrases for object retrieval

Object retrieval is still an open question. A promising approach is based on the matching of visual phrases. However, this routine is often corrupted by visual phrase burstiness, i.e., the repetitive occurrence of some certain visual phrases. Burstiness leads to over-counting the co-occurring visual patterns between two images, thus would deteriorate the accuracy of image similarity measurement. On the other hand, existing methods are incapable of capturing the complete geometric variation between images. In this paper, we propose a novel strategy to address the two problems. Firstly, we propose a unified framework for matching geometry-constrained visual phrases. This framework provides a possibility of combing the optimal geometry constraints to improve the validity of matched visual phrases. Secondly, we propose to address the problem of visual phrase burstiness from a probabilistic view. This approach effectively filters out the bursty visual phrases through explicitly modelling their distribution. Experiments on five benchmark datasets demonstrate that our method outperforms other approaches consistently and significantly.

[1]  Cordelia Schmid,et al.  Correlation-based burstiness for logo retrieval , 2012, ACM Multimedia.

[2]  Yannis Avrithis,et al.  Early burst detection for memory-efficient image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[4]  Tsuhan Chen,et al.  Image retrieval with geometry-preserving visual phrases , 2011, CVPR 2011.

[5]  Christopher Hunt,et al.  Notes on the OpenSURF Library , 2009 .

[6]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, CVPR.

[7]  Yuning Jiang,et al.  Randomized Spatial Context for Object Search , 2015, IEEE Transactions on Image Processing.

[8]  James Allan,et al.  A comparison of statistical significance tests for information retrieval evaluation , 2007, CIKM '07.

[9]  Bingbing Ni,et al.  Building descriptive and discriminative visual codebook for large-scale image applications , 2010, Multimedia Tools and Applications.

[10]  Qi Tian,et al.  Packing and Padding: Coupled Multi-index for Accurate Image Retrieval , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[12]  Hervé Jégou,et al.  Negative Evidences and Co-occurences in Image Retrieval: The Benefit of PCA and Whitening , 2012, ECCV.

[13]  Cordelia Schmid,et al.  Indexing Based on Scale Invariant Interest Points , 2001, ICCV.

[14]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[15]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[16]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[17]  C. Schmid,et al.  On the burstiness of visual elements , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[19]  Shengjin Wang,et al.  Visual Phraselet: Refining Spatial Constraints for Large Scale Image Search , 2013, IEEE Signal Processing Letters.

[20]  Michael Isard,et al.  Lost in quantization: Improving particular object retrieval in large scale image databases , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[22]  Ying Wu,et al.  Spatially-Constrained Similarity Measurefor Large-Scale Object Retrieval , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[23]  Chong-Wah Ngo,et al.  On the Annotation of Web Videos by Efficient Near-Duplicate Search , 2010, IEEE Transactions on Multimedia.

[24]  B. S. Manjunath,et al.  Graph-Based Topic-Focused Retrieval in Distributed Camera Network , 2013, IEEE Transactions on Multimedia.

[25]  Feng Wu,et al.  3D visual phrases for landmark recognition , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Luc Van Gool,et al.  Query Adaptive Similarity for Large Scale Object Retrieval , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[27]  Kunio Kashino,et al.  BM25 With Exponential IDF for Instance Search , 2014, IEEE Transactions on Multimedia.

[28]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[30]  Qi Tian,et al.  Bayes Merging of Multiple Vocabularies for Scalable Image Retrieval , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[32]  Chabane Djeraba,et al.  Toward a higher-level visual representation for content-based image retrieval , 2010, Multimedia Tools and Applications.

[33]  Jiri Matas,et al.  Unsupervised discovery of co-occurrence in sparse high dimensional data , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[34]  Ming Yang,et al.  Contextual weighting for vocabulary tree based image retrieval , 2011, 2011 International Conference on Computer Vision.

[35]  Jiri Matas,et al.  Total recall II: Query expansion revisited , 2011, CVPR 2011.

[36]  Andrew Zisserman,et al.  Three things everyone should know to improve object retrieval , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[37]  Jiri Matas,et al.  Fixing the Locally Optimized RANSAC , 2012, BMVC.

[38]  Qi Tian,et al.  Seeing the Big Picture: Deep Embedding with Contextual Evidences , 2014, ArXiv.

[39]  Chabane Djeraba,et al.  Toward a higher-level visual representation for content-based image retrieval , 2010, MoMM.