FSpH: Fitted spectral hashing for efficient similarity search

Abstract Spectral hashing (SpH) is an efficient and simple binary hashing method, which assumes that data are sampled from a multidimensional uniform distribution. However, this assumption is too restrictive in practice. In this paper we propose an improved method, fitted spectral hashing (FSpH), to relax this distribution assumption. Our work is based on the fact that one-dimensional data of any distribution could be mapped to a uniform distribution without changing the local neighbor relations among data items. We have found that this mapping on each PCA direction has certain regular pattern, and could be fitted well by S-curve function (Sigmoid function). With more parameters Fourier function also fits data well. Thus with Sigmoid function and Fourier function, we propose two binary hashing methods: SFSpH and FFSpH. Experiments show that our methods are efficient and outperform state-of-the-art methods.

[1]  Jun Wang,et al.  Laplacian Co-hashing of Terms and Documents , 2010, ECIR.

[2]  Wei-Ying Ma,et al.  AnnoSearch: Image Auto-Annotation by Search , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[3]  Svetlana Lazebnik,et al.  Iterative quantization: A procrustean approach to learning binary codes , 2011, CVPR 2011.

[4]  Shumeet Baluja,et al.  Learning to hash: forgiving hash functions and applications , 2008, Data Mining and Knowledge Discovery.

[5]  Ali Farhadi,et al.  Attribute Discovery via Predictable Discriminative Binary Codes , 2012, ECCV.

[6]  Jun Wang,et al.  Comparing apples to oranges: a scalable solution with heterogeneous hashing , 2013, KDD.

[7]  Jun Wang,et al.  Self-taught hashing for fast similarity search , 2010, SIGIR.

[8]  Shih-Fu Chang,et al.  Semi-Supervised Hashing for Large-Scale Search , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  R. K. Shyamasundar,et al.  Introduction to algorithms , 1996 .

[10]  Ke Chen,et al.  Semi-Supervised Learning via Regularized Boosting Working on Multiple Semi-Supervised Assumptions , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[11]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Pascal Fua,et al.  LDAHash: Improved Matching with Smaller Descriptors , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Meng Wang,et al.  Spectral Hashing With Semantically Consistent Graph for Image Indexing , 2013, IEEE Transactions on Multimedia.

[14]  Wei Liu,et al.  Hashing with Graphs , 2011, ICML.

[15]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[16]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[17]  Geoffrey E. Hinton,et al.  Semantic hashing , 2009, Int. J. Approx. Reason..

[18]  Peter Wegner,et al.  A technique for counting ones in a binary computer , 1960, CACM.

[19]  Rongrong Ji,et al.  Supervised hashing with kernels , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[20]  Jitendra Malik,et al.  Spectral grouping using the Nystrom method , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Kristen Grauman,et al.  Learning Binary Hash Codes for Large-Scale Image Search , 2013, Machine Learning for Computer Vision.

[22]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[23]  Mikhail Belkin,et al.  Laplacian Eigenmaps for Dimensionality Reduction and Data Representation , 2003, Neural Computation.

[24]  Giovanni Maria Farinella,et al.  MACHINE LEARNING IN COMPUTER VISION , 2002 .

[25]  Kristen Grauman,et al.  Kernelized Locality-Sensitive Hashing , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[27]  Yongdong Zhang,et al.  Data-oriented locality sensitive hashing , 2010, ACM Multimedia.

[28]  Qi Tian,et al.  Super-Bit Locality-Sensitive Hashing , 2012, NIPS.

[29]  Lei Wu,et al.  Compact projection: Simple and efficient near neighbor search with practical memory requirements , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[30]  Sanjiv Kumar,et al.  Learning Binary Codes for High-Dimensional Data Using Bilinear Projections , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  Laurent Amsaleg,et al.  NV-Tree: An Efficient Disk-Based Index for Approximate Search in Very Large High-Dimensional Collections , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  Qi Tian,et al.  Social-oriented visual image search , 2014, Comput. Vis. Image Underst..

[33]  Trevor Darrell,et al.  Fast pose estimation with parameter-sensitive hashing , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[34]  Yue Gao,et al.  When Amazon Meets Google: Product Visualization by Exploring Multiple Web Sources , 2013, TOIT.