Region-Level Visual Consistency Verification for Large-Scale Partial-Duplicate Image Search

Most recent large-scale image search approaches build on a bag-of-visual-words model, in which local features are quantized and then efficiently matched between images. However, the limited discriminability of local features and the BOW quantization errors cause a lot of mismatches between images, which limit search accuracy. To improve the accuracy, geometric verification is popularly adopted to identify geometrically consistent local matches for image search, but it is hard to directly use these matches to distinguish partial-duplicate images from non-partial-duplicate images. To address this issue, instead of simply identifying geometrically consistent matches, we propose a region-level visual consistency verification scheme to confirm whether there are visually consistent region (VCR) pairs between images for partial-duplicate search. Specifically, after the local feature matching, the potential VCRs are constructed via mapping the regions segmented from candidate images to a query image by utilizing the properties of the matched local features. Then, the compact gradient descriptor and convolutional neural network descriptor are extracted and matched between the potential VCRs to verify their visual consistency to determine whether they are VCRs. Moreover, two fast pruning algorithms are proposed to further improve efficiency. Extensive experiments demonstrate the proposed approach achieves higher accuracy than the state of the art and provide comparable efficiency for large-scale partial-duplicate search tasks.

[1]  Qi Tian,et al.  Scalable Feature Matching by Dual Cascaded Scalar Quantization for Image Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  Xingming Sun,et al.  Effective and Efficient Global Context Verification for Image Copy Detection , 2017, IEEE Transactions on Information Forensics and Security.

[3]  Gang Hua,et al.  Building contextual visual vocabulary for large-scale image applications , 2010, ACM Multimedia.

[4]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Qi Tian,et al.  Accurate Image Search with Multi-Scale Contextual Evidences , 2016, International Journal of Computer Vision.

[6]  Qi Tian,et al.  SIFT match verification by geometric coding for large-scale partial-duplicate web image search , 2013, TOMCCAP.

[7]  Qingming Huang,et al.  Robust Spatial Consistency Graph Model for Partial Duplicate Image Retrieval , 2013, IEEE Transactions on Multimedia.

[8]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[9]  Xingming Sun,et al.  Encoding multiple contextual clues for partial-duplicate image retrieval , 2017, Pattern Recognit. Lett..

[10]  Q. M. Jonathan Wu,et al.  Coverless image steganography using partial-duplicate image retrieval , 2018, Soft Computing.

[11]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[13]  Robert C. Bolles,et al.  Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography , 1981, CACM.

[14]  Qi Tian,et al.  BSIFT: Toward Data-Independent Codebook for Large Scale Image Search , 2015, IEEE Transactions on Image Processing.

[15]  O. Chum,et al.  ENHANCING RANSAC BY GENERALIZED MODEL OPTIMIZATION Onďrej Chum, Jǐ , 2003 .

[16]  Grigorios Tsoumakas,et al.  A Comprehensive Study Over VLAD and Product Quantization in Large-Scale Image Retrieval , 2014, IEEE Transactions on Multimedia.

[17]  Bing Yang,et al.  Near-Duplicate Image Retrieval Based on Contextual Descriptor , 2015, IEEE Signal Processing Letters.

[18]  Chong-Wah Ngo,et al.  On the Annotation of Web Videos by Efficient Near-Duplicate Search , 2010, IEEE Transactions on Multimedia.

[19]  Ligang Zheng,et al.  Fast Near-duplicate Image Detection in Riemannian Space by a Novel Hashing Scheme , 2018 .

[20]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[21]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[22]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[23]  Noel E. O'Connor,et al.  Bags of Local Convolutional Features for Scalable Instance Search , 2016, ICMR.

[24]  Yimin Yang,et al.  Recomputation of the Dense Layers for Performance Improvement of DCNN , 2020, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[25]  Antonio Torralba,et al.  Nonparametric Scene Parsing via Label Transfer , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Martha Larson,et al.  Pairwise geometric matching for large-scale object retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Xingming Sun,et al.  Effective and Efficient Image Copy Detection with Resistance to Arbitrary Rotation , 2016, IEICE Trans. Inf. Syst..

[28]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[29]  Victor S. Lempitsky,et al.  Aggregating Deep Convolutional Features for Image Retrieval , 2015, ArXiv.

[30]  Florent Perronnin,et al.  Large-scale image retrieval with compressed Fisher vectors , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[31]  Simon Osindero,et al.  Cross-Dimensional Weighting for Aggregated Deep Convolutional Features , 2015, ECCV Workshops.

[32]  Yuan Yan Tang,et al.  Landmark Summarization With Diverse Viewpoints , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[33]  Thomas Brox,et al.  Descriptor Matching with Convolutional Neural Networks: a Comparison to SIFT , 2014, ArXiv.

[34]  Cordelia Schmid,et al.  Scale & Affine Invariant Interest Point Detectors , 2004, International Journal of Computer Vision.

[35]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[36]  Shiliang Zhang,et al.  USB: Ultrashort Binary Descriptor for Fast Visual Matching and Retrieval , 2014, IEEE Transactions on Image Processing.

[37]  Qi Tian,et al.  Towards Codebook-Free: Scalable Cascaded Hashing for Mobile Image Search , 2014, IEEE Transactions on Multimedia.

[38]  Shuang Wang,et al.  INSTRE: A New Benchmark for Instance-Level Object Retrieval and Recognition , 2015, ACM Trans. Multim. Comput. Commun. Appl..

[39]  Qi Tian,et al.  Embedding spatial context information into inverted filefor large-scale image retrieval , 2012, ACM Multimedia.

[40]  Shiliang Zhang,et al.  Building pair-wise visual word tree for efficent image re-ranking , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[41]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[42]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[43]  Yi Cao,et al.  Faster-RCNN Based Robust Coverless Information Hiding System in Cloud Environment , 2019, IEEE Access.

[44]  Jiri Matas,et al.  Robust wide-baseline stereo from maximally stable extremal regions , 2004, Image Vis. Comput..

[45]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[46]  智一 吉田,et al.  Efficient Graph-Based Image Segmentationを用いた圃場図自動作成手法の検討 , 2014 .

[47]  Pascal Fua,et al.  SLIC Superpixels Compared to State-of-the-Art Superpixel Methods , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[49]  Svetlana Lazebnik,et al.  Superparsing , 2010, International Journal of Computer Vision.