论文信息 - Fast computation of min-Hash signatures for image collections

Fast computation of min-Hash signatures for image collections

A new method for highly efficient min-Hash generation for document collections is proposed. It exploits the inverted file structure which is available in many applications based on a bag or a set of words. Fast min-Hash generation is important in applications such as image clustering where good recall and precision requires a large number of min-Hash signatures. Using the set of words represenation, the novel exact min-Hash generation algorithm achieves approximately a 50-fold speed-up on two dataset with 105 and 106 images respectively. We also propose an approximate min-Hash assignment process which reaches a more than 200-fold speed-up at the cost of missing about 2-3% of matches. We also experimentally show that the method generalizes to other modalities with significantly different statistics.

Jiri Matas | Ondrej Chum

[1] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[2] Matthew A. Brown,et al. Picking the best DAISY , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[3] Jan-Michael Frahm,et al. Building Rome on a Cloudless Day , 2010, ECCV.

[4] Andrei Z. Broder,et al. On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[5] Jiri Matas,et al. Geometric min-Hashing: Finding a (thick) needle in a haystack , 2009, CVPR.

[6] Simon Haykin,et al. GradientBased Learning Applied to Document Recognition , 2001 .

[7] Matthew A. Brown,et al. Learning Local Image Descriptors , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[8] Andrew Zisserman,et al. Near Duplicate Image Detection: min-Hash and tf-idf Weighting , 2008, BMVC.

[9] Bastian Leibe,et al. Discovering favorite views of popular places with iconoid shift , 2011, 2011 International Conference on Computer Vision.

[10] Jiri Matas,et al. Large-Scale Discovery of Spatially Related Images , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11] Jiri Matas,et al. Learning a Fine Vocabulary , 2010, ECCV.

[12] O. Chum,et al. Geometric min-Hashing: Finding a (thick) needle in a haystack , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[13] Jiri Matas,et al. Improving Descriptors for Fast Tree Matching by Optimal Linear Projection , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[14] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[15] Piotr Indyk,et al. Approximate Nearest Neighbor: Towards Removing the Curse of Dimensionality , 2012, Theory Comput..