Integrating SIFT and CNN Feature Matching for Partial-Duplicate Image Detection

With the increasing popularity of various deep neural networks in the area of computational intelligence, the research attention for content-based image detection/retrieval has been shifted from the handcrafted local features such as scale invariant feature transform (SIFT) to the features derived from convolutional neural networks (CNN). However, the existing image-based CNN features, directly extracted from the entire images, are not suitable for detecting small duplicate regions, while region-based CNN features show limited robustness to a variety of image modifications such as rescaling, occlusion, and noise adding. These will affect the performance of partial-duplicate image detection. To address these issues, we propose an integrated feature matching scheme, which integrates the matching of SIFT features and CNN features between images for partial-duplicate image detection. In this scheme, we first implement SIFT feature matching based on the bag-of-visual-words model to detect the potential duplicate region pairs between images, and then match the CNN features of these regions extracted from the deep convolutional layer of CNN to compute image similarity. Since both the good robustness of SIFT features and the high discriminative power of CNN features are sufficiently explored, our scheme allows an accurate detection. Experimental results show that the proposed approach provides superior accuracy than the state of the arts, while achieves comparable efficiency.

[1]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[2]  Shin'ichi Satoh,et al.  Faster R-CNN Features for Instance Search , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[3]  Larry S. Davis,et al.  Exploiting local features from deep networks for image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[4]  Michael Isard,et al.  Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[5]  Zhili Zhou,et al.  Fast and accurate near-duplicate image elimination for visual sensor networks , 2017, Int. J. Distributed Sens. Networks.

[6]  R. Venkatesh Babu,et al.  Object level deep feature pooling for compact image representation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[7]  Q. M. Jonathan Wu,et al.  Coverless image steganography using partial-duplicate image retrieval , 2018, Soft Computing.

[8]  Svetlana Lazebnik,et al.  Multi-scale Orderless Pooling of Deep Convolutional Activation Features , 2014, ECCV.

[9]  Simon Osindero,et al.  Cross-Dimensional Weighting for Aggregated Deep Convolutional Features , 2015, ECCV Workshops.

[10]  Xingming Sun,et al.  Effective and Efficient Global Context Verification for Image Copy Detection , 2017, IEEE Transactions on Information Forensics and Security.

[11]  C. Lawrence Zitnick,et al.  Edge Boxes: Locating Object Proposals from Edges , 2014, ECCV.

[12]  Stefan Carlsson,et al.  CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[13]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[15]  Luc Van Gool,et al.  Speeded-Up Robust Features (SURF) , 2008, Comput. Vis. Image Underst..

[16]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[17]  Tat-Seng Chua,et al.  Image Annotation by Graph-Based Inference With Integrated Multiple/Single Instance Representations , 2010, IEEE Transactions on Multimedia.

[18]  Alex ChiChung Kot,et al.  Image splicing localization based on blur type inconsistency , 2015, 2015 IEEE International Symposium on Circuits and Systems (ISCAS).

[19]  Xingming Sun,et al.  Encoding multiple contextual clues for partial-duplicate image retrieval , 2017, Pattern Recognit. Lett..

[20]  Jeng-Shyang Pan,et al.  Rotation invariant watermark embedding based on scale-adapted characteristic regions , 2010, Inf. Sci..

[21]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[22]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[23]  Jiri Matas,et al.  Total recall II: Query expansion revisited , 2011, CVPR 2011.

[24]  Koen E. A. van de Sande,et al.  Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[25]  Victor S. Lempitsky,et al.  Aggregating Deep Convolutional Features for Image Retrieval , 2015, ArXiv.

[26]  Qi Tian,et al.  Fast and accurate near-duplicate image search with affinity propagation on the ImageWeb , 2014, Comput. Vis. Image Underst..

[27]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[28]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[29]  Alberto Del Bimbo,et al.  Fisher Encoded Convolutional Bag-of-Windows for Efficient Image Retrieval and Social Image Tagging , 2015, 2015 IEEE International Conference on Computer Vision Workshop (ICCVW).

[30]  David Stutz,et al.  Neural Codes for Image Retrieval , 2015 .

[31]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[32]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Qi Tian,et al.  SIFT match verification by geometric coding for large-scale partial-duplicate web image search , 2013, TOMCCAP.

[34]  Hung-Khoon Tan,et al.  Real-Time Near-Duplicate Elimination for Web Video Search With Content and Context , 2009, IEEE Transactions on Multimedia.

[35]  Hongbin Zha,et al.  The Shape Interaction Matrix-Based Affine Invariant Mismatch Removal for Partial-Duplicate Image Search , 2017, IEEE Transactions on Image Processing.

[36]  Yan Ke,et al.  PCA-SIFT: a more distinctive representation for local image descriptors , 2004, CVPR 2004.

[37]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[38]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .