Min-hash sketch construction via nonparametric clustering

In partial duplicate image retrieval systems, min-Hash algorithms are widely used because of its high efficiency and robustness. In most of min-Hash algorithms, min-Hash functions are considered independent and grouped into tuples called sketches, the discriminative power of sketches are limited. By modeling correlations of min-Hash functions, we propose a novel sketch construction method called Nonpara-metric Clustering min-Hash (NCmH). In NCmH, the randomly generated min-Hash functions are clustered before grouping them into sketches, while spatial information is fully used in this process. The constructed sketches preserve abundant spatial information between visual words, thus NCmH achieves higher retrieval accuracy compared to the standard min-Hash. Furthermore, our method can be combined with other min-Hash algorithms such as GVP mH [1], PmH [2] and TmH [3] to further improve accuracy. In experiments, we show that our method outperforms the standard min-Hash and improves the state-of-the-art min-Hash algorithm on Oxford 5K dataset and University of Kentucky dataset.

[1]  Ying Wu,et al.  Object retrieval and localization with spatially-constrained similarity measure and k-NN re-ranking , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  Andrew Zisserman,et al.  Near Duplicate Image Detection: min-Hash and tf-idf Weighting , 2008, BMVC.

[3]  Michael Isard,et al.  General Theory , 1969 .

[4]  Michael Isard,et al.  Partition Min-Hash for Partial Duplicate Image Discovery , 2010, ECCV.

[5]  Michael Isard,et al.  Object retrieval with large vocabularies and fast spatial matching , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[6]  Jiri Matas,et al.  Fast computation of min-Hash signatures for image collections , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[8]  David Nistér,et al.  Scalable Recognition with a Vocabulary Tree , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[9]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[10]  Qian Zhang,et al.  Tree partition voting min-hash for partial duplicate image discovery , 2013, 2013 IEEE International Conference on Multimedia and Expo (ICME).

[11]  Jiri Matas,et al.  Geometric min-Hashing: Finding a (thick) needle in a haystack , 2009, CVPR.

[12]  Jiri Matas,et al.  Unsupervised discovery of co-occurrence in sparse high dimensional data , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Tsuhan Chen,et al.  Image retrieval with geometry-preserving visual phrases , 2011, CVPR 2011.

[14]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .