Semi-supervised spectral hashing for fast similarity search

Fast similarity search has been a key step in many large-scale computer vision and information retrieval tasks. Recently, there are a surge of research interests on the hashing-based techniques to allow approximate but highly efficient similarity search. Most existing hashing methods are unsupervised, which demonstrate the promising performance using the information of unlabeled data to generate binary codes. In this paper, we propose a novel semi-supervised hashing method to take into account the pairwise supervised information including must-link and cannot-link, and then maximize the information provided by each bit according to both the labeled data and the unlabeled data. Different from previous works on semi-supervised hashing, we use the square of the Euclidean distance to measure the Hamming distance, which leads to a more general Laplacian matrix based solution after the relaxation by removing the binary constraints. We also relax the orthogonality constraints to reduce the error when converting the real-value solution to the binary one. The experimental evaluations on three benchmark datasets show the superior performance of the proposed method over the state-of-the-art approaches.

[1]  Falk Scholer,et al.  User performance versus precision measures for simple search tasks , 2006, SIGIR.

[2]  Jun Wang,et al.  Self-taught hashing for fast similarity search , 2010, SIGIR.

[3]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[4]  Trevor Darrell,et al.  Pyramid Match Hashing: Sub-Linear Time Indexing Over Partial Correspondences , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[5]  Trevor Darrell,et al.  Learning to Hash with Binary Reconstructive Embeddings , 2009, NIPS.

[6]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[7]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[8]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[9]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[10]  Peter Wegner,et al.  A technique for counting ones in a binary computer , 1960, CACM.

[11]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[12]  John Langford,et al.  Cover trees for nearest neighbor , 2006, ICML.

[13]  Shumeet Baluja,et al.  Learning to hash: forgiving hash functions and applications , 2008, Data Mining and Knowledge Discovery.

[14]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[15]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[16]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[17]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[18]  Benno Stein,et al.  Strategies for retrieving plagiarized documents , 2007, SIGIR.

[19]  T FreemanWilliam,et al.  80 Million Tiny Images , 2008 .

[20]  David Thomas,et al.  The Art in Computer Programming , 2001 .

[21]  Prateek Jain,et al.  Fast Similarity Search for Learned Metrics , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[22]  Benno Stein Principles of hash-based text retrieval , 2007, SIGIR.

[23]  R. Merris Laplacian matrices of graphs: a survey , 1994 .

[24]  Richard I. Hartley,et al.  Optimised KD-trees for fast image descriptor matching , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Shih-Fu Chang,et al.  Semi-supervised hashing for scalable image retrieval , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[27]  Cordelia Schmid,et al.  Packing bag-of-features , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[28]  Prateek Jain,et al.  Fast image search for learned metrics , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Yehuda Koren,et al.  Factorization meets the neighborhood: a multifaceted collaborative filtering model , 2008, KDD.

[30]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[31]  Moses Charikar,et al.  Similarity estimation techniques from rounding algorithms , 2002, STOC '02.

[32]  Sergei Vassilvitskii,et al.  Nearest-neighbor caching for content-match applications , 2009, WWW '09.

[33]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[34]  Shih-Fu Chang,et al.  Sequential Projection Learning for Hashing with Compact Codes , 2010, ICML.

[35]  Nicu Sebe,et al.  Content-based multimedia information retrieval: State of the art and challenges , 2006, TOMCCAP.

[36]  Monika Henzinger,et al.  Finding near-duplicate web pages: a large-scale evaluation of algorithms , 2006, SIGIR.

[37]  David G. Lowe,et al.  Object recognition from local scale-invariant features , 1999, Proceedings of the Seventh IEEE International Conference on Computer Vision.

[38]  Maks Ovsjanikov,et al.  Article 1 A. M. Bronstein Shape Google: Geometric Words and Expressions for (20 pages) M. M. Bronstein Invariant Shape Retrieval , 2011 .

[39]  Leonidas J. Guibas,et al.  Shape google: Geometric words and expressions for invariant shape retrieval , 2011, TOGS.

[40]  Wei Liu,et al.  Scalable similarity search with optimized kernel hashing , 2010, KDD.