论文信息 - Beyond “Near Duplicates”: Learning Hash Codes for Efficient Similar-Image Retrieval

Beyond “Near Duplicates”: Learning Hash Codes for Efficient Similar-Image Retrieval

Finding similar images in a large database is an important, but often computationally expensive, task. In this paper, we present a two-tier similar-image retrieval system with the efficiency characteristics found in simpler systems designed to recognize near-duplicates. We compare the efficiency of lookups based on random projections and learned hashes to 100-times-more-frequent exemplar sampling. Both approaches significantly improve on the results from exemplar sampling, despite having significantly lower computational costs. Learned-hash keys provide the best result, in terms of both recall and efficiency.

Shumeet Baluja | Michele Covell

[1] Paul A. Viola,et al. Robust Real-Time Face Detection , 2001, International Journal of Computer Vision.

[2] Paul A. Viola,et al. Robust Real-time Object Detection , 2001 .

[3] H. W. Kuhn B R Y N Mawr College. Variants of the Hungarian Method for Assignment Problems' , 1955 .

[4] Peter N. Yianilos,et al. Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[5] Yan Ke,et al. An efficient parts-based near-duplicate and sub-image retrieval system , 2004, MULTIMEDIA '04.

[6] Shumeet Baluja,et al. Learning to hash: forgiving hash functions and applications , 2008, Data Mining and Knowledge Discovery.

[7] Yan Ke,et al. Efficient Near-duplicate Detection and Sub-image Retrieval , 2004 .

[8] Shumeet Baluja,et al. Finding Images and Line-Drawings in Document-Scanning Systems , 2009, 2009 10th International Conference on Document Analysis and Recognition.

[9] Shumeet Baluja,et al. Known-Audio Detection using Waveprint: Spectrogram Fingerprinting by Wavelet Hashing , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.