Hashing based feature aggregating for fast image copy retrieval

Recently the methods based on visual words have become very popular in near- duplicate retrieval and content identification. However, obtaining the visual vocabulary by quantization is very time-consuming and unscalable to large databases. In this paper, we propose a fast feature aggregating method for image representation which uses machine learning based hashing to achieve fast feature aggregation. Since the machine learning based hashing effectively preserves neighborhood structure of data, it yields visual words with strong discriminability. Furthermore, the generated binary codes leads image representation building to be of low-complexity, making it efficient and scalable to large scale databases. The evaluation shows that our approach significantly outperforms state-of-the-art methods.

[1]  Hefei Ling,et al.  Efficient Image Copy Detection Using Multiscale Fingerprints , 2012, IEEE MultiMedia.

[2]  Jiawei Han,et al.  SRDA: An Efficient Algorithm for Large-Scale Discriminant Analysis , 2008, IEEE Transactions on Knowledge and Data Engineering.

[3]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[4]  Olivier Buisson,et al.  Scalable mining of large video databases using copy detection , 2008, ACM Multimedia.

[5]  Patrick Gros,et al.  Robust content-based image searches for copyright protection , 2003, MMDB '03.

[6]  Jun Wang,et al.  Self-taught hashing for fast similarity search , 2010, SIGIR.

[7]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[8]  Jun Jie Foo,et al.  Pruning SIFT for Scalable Near-duplicate Image Matching , 2007, ADC.