Improving image similarity estimation via global distance distribution information

Abstract Estimating the similarity between two images or image patches is at the heart of many computer vision problems including content-based image retrieval, image registration, and scene recognition. However, commonly used distance-based similarity estimation is not always reliable due to the limitations in both image understanding techniques and distance metrics. In this paper, we present a scheme to improve the similarity estimation under image search scenario. To this end, we explore the discriminative capability underlying global distance distribution obtained by querying an auxiliary image dataset in an unsupervised manner. According to the results of motivational experiments, we discover that global distance distributions have the desired capability in distinguishing inter-class images which can be applied to enhance the original distance metric. Following this finding, we propose a novel approach to incorporate the global distance distribution into the original distance metric to improve the reliability of the similarity estimation. One key novelty of this approach is to model the global distance distribution as Rayleigh distribution and then represent the difference between two distributions by the relative entropy. In this way, the difference between two global distance distributions can be calculated in an extremely efficient way. We also demonstrate that Rayleigh distribution leads to consistent performance compared to the real distribution. Extensive experiments on three public datasets with various image representations and distance metrics show that the enhanced similarity estimation remarkably outperforms the original one. Furthermore, the proposed approach shows the desired scalability for handling large-scale image search scenarios.

[1]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Cordelia Schmid,et al.  Accurate Image Search Using the Contextual Dissimilarity Measure , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[5]  Ling-Yu Duan,et al.  Finding the Secret of Image Saliency in the Frequency Domain , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[6]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Dominik Endres,et al.  A new metric for probability distributions , 2003, IEEE Transactions on Information Theory.

[8]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[9]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[10]  Fahad Shahbaz Khan,et al.  Modulating Shape Features by Color Attention for Object Recognition , 2012, International Journal of Computer Vision.

[11]  Yao Zhao,et al.  Two-stream Attentive CNNs for Image Retrieval , 2017, ACM Multimedia.

[12]  Andrea Vedaldi,et al.  Vlfeat: an open and portable library of computer vision algorithms , 2010, ACM Multimedia.

[13]  Qi Tian,et al.  SIFT match verification by geometric coding for large-scale partial-duplicate web image search , 2013, TOMCCAP.

[14]  Jitendra Malik,et al.  Color- and texture-based image segmentation using EM and its application to content-based image retrieval , 1998, Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271).

[15]  Yao Zhao,et al.  Improving the similarity estimation via score distribution , 2016, 2016 IEEE International Conference on Multimedia and Expo (ICME).

[16]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17]  Qi Tian,et al.  Packing and Padding: Coupled Multi-index for Accurate Image Retrieval , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Jianru Xue,et al.  Democratic Diffusion Aggregation for Image Retrieval , 2016, IEEE Transactions on Multimedia.

[19]  Inderjit S. Dhillon,et al.  Information-theoretic metric learning , 2006, ICML '07.

[20]  Solomon Kullback,et al.  Information Theory and Statistics , 1960 .

[21]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[23]  Meng Wang,et al.  Beyond Distance Measurement: Constructing Neighborhood Similarity for Video Annotation , 2009, IEEE Transactions on Multimedia.

[24]  Luc Van Gool,et al.  Hello neighbor: Accurate object retrieval with k-reciprocal nearest neighbors , 2011, CVPR 2011.

[25]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Yann LeCun,et al.  Learning a similarity metric discriminatively, with application to face verification , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[27]  Jun Yu,et al.  Semantic preserving distance metric learning and applications , 2014, Inf. Sci..

[28]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[29]  Michael I. Jordan,et al.  Distance Metric Learning with Application to Clustering with Side-Information , 2002, NIPS.

[30]  Qi Tian,et al.  SIFT Meets CNN: A Decade Survey of Instance Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[31]  Anil K. Jain,et al.  Likelihood Ratio-Based Biometric Score Fusion , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Silvio Savarese,et al.  Deep Metric Learning via Lifted Structured Feature Embedding , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Chong-Wah Ngo,et al.  Hyperlink-Aware Object Retrieval , 2016, IEEE Transactions on Image Processing.

[34]  Moshe Zakai General error criteria (Corresp.) , 1964, IEEE Trans. Inf. Theory.

[35]  Yao Zhao,et al.  Neighborhood reversibility verifying for image search , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[36]  Qi Tian,et al.  Query-adaptive late fusion for image search and person re-identification , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  James Philbin,et al.  FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Jitendra Malik,et al.  Image Retrieval and Classification Using Local Distance Functions , 2006, NIPS.

[39]  Ming Yang,et al.  Query Specific Fusion for Image Retrieval , 2012, ECCV.

[40]  Qionghai Dai,et al.  Cross-Modality Bridging and Knowledge Transferring for Image Understanding , 2019, IEEE Transactions on Multimedia.

[41]  Michael Isard,et al.  Bundling features for large scale partial-duplicate web image search , 2009, CVPR.

[42]  Kilian Q. Weinberger,et al.  Distance Metric Learning for Large Margin Nearest Neighbor Classification , 2005, NIPS.

[43]  Ronan Sicre,et al.  Particular object retrieval with integral max-pooling of CNN activations , 2015, ICLR.

[44]  Cordelia Schmid,et al.  Aggregating local descriptors into a compact image representation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.