Nonnegative sparse coding induced hashing for image copy detection

Among the existing hashing methods, the Self-taught hashing (STH) is regarded as the state-of-the-art work. However, it still suffers the problem of semantic loss, which mainly comes from the fact that the original optimization objective of in-sample data is NP-hard and therefore is compromised into the combination of Laplacian Eigenmaps (LE) and binarization. Obviously, the shape associated with the embedding of LE is quite dissimilar to that of binary code. As a result, binarization of the LE embedding readily leads to significant semantic loss. To overcome this drawback, we combine the constrained nonnegative sparse coding and the Support Vector Machine (SVM) to propose a new hashing method, called nonnegative sparse coding induced hashing (NSCIH). Here, nonnegative sparse coding is exploited for seeking a better intermediate representation, which can make sure that the binarization can be smoothly conducted. In addition, we build an image copy detection scheme based on the proposed hashing methods. The extensive experiments show that the NSCIH is superior to the state-of-the-art hashing methods. At the same time, this copy detection scheme can be used for performing copy detection over very large image database.

[1]  Robert Krauthgamer,et al.  Navigating nets: simple algorithms for proximity search , 2004, SODA '04.

[2]  Chris H. Q. Ding,et al.  Orthogonal nonnegative matrix t-factorizations for clustering , 2006, KDD '06.

[3]  J. D'Haeyer Gaussian filtering of images: A regularization approach , 1989 .

[4]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[5]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[6]  Rajat Raina,et al.  Efficient sparse coding algorithms , 2006, NIPS.

[7]  Marios Hadjieleftheriou,et al.  R-Trees - A Dynamic Index Structure for Spatial Searching , 2008, ACM SIGSPATIAL International Workshop on Advances in Geographic Information Systems.

[8]  Michael A. Saunders,et al.  LSQR: An Algorithm for Sparse Linear Equations and Sparse Least Squares , 1982, TOMS.

[9]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[10]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[11]  Hujun Bao,et al.  Sparse concept coding for visual analysis , 2011, CVPR 2011.

[12]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[13]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[14]  Yi Yang,et al.  Harmonizing Hierarchical Manifolds for Multimedia Document Semantics Understanding and Cross-Media Retrieval , 2008, IEEE Transactions on Multimedia.

[15]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[16]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[17]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[18]  Jun Wang,et al.  Self-taught hashing for fast similarity search , 2010, SIGIR.

[19]  Yi Yang,et al.  A Multimedia Retrieval Framework Based on Semi-Supervised Ranking and Relevance Feedback , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Rina Panigrahy,et al.  Entropy based nearest neighbor search in high dimensions , 2005, SODA '06.

[21]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[22]  A. Guttman,et al.  A Dynamic Index Structure for Spatial Searching , 1984, SIGMOD 1984.

[23]  Yongdong Zhang,et al.  Data-oriented locality sensitive hashing , 2010, ACM Multimedia.

[24]  R. Tibshirani,et al.  Least angle regression , 2004, math/0406456.

[25]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[26]  Stefanos Zafeiriou,et al.  Nonnegative Embeddings and Projections for Dimensionality Reduction and Information Visualization , 2010, 2010 20th International Conference on Pattern Recognition.

[27]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[28]  Zi Huang,et al.  Multiple feature hashing for real-time large scale near-duplicate video retrieval , 2011, ACM Multimedia.

[29]  Patrik O. Hoyer,et al.  Non-negative sparse coding , 2002, Proceedings of the 12th IEEE Workshop on Neural Networks for Signal Processing.

[30]  Jon Louis Bentley,et al.  K-d trees for semidynamic point sets , 1990, SCG '90.

[31]  Changick Kim,et al.  Content-based image copy detection , 2003, Signal Process. Image Commun..