SEARCHINGWITH EXPECTATIONS Harsimrat Sandhawalia

Handling large amounts of data, such as large image databases, requires the use of approximate nearest neighbor search techniques. Recently, Hamming embedding methods such as spectral hashing have addressed the problem of obtaining compact binary codes optimizing the trade-off between the memory usage and the probability of retrieving the true nearest neighbors. In this paper, we formulate the problem of generating compact signatures as a rate-distortion problem. In the spirit of source coding algorithms, we aim at minimizing the reconstruction error on the squared distances with a constraint on the memory usage. The vectors are ranked based on the distance estimates to the query vector. Experiments on image descriptors show a significant improvement over spectral hashing.

[1]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[2]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[3]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[4]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[5]  Cordelia Schmid,et al.  Hamming Embedding and Weak Geometric Consistency for Large Scale Image Search , 2008, ECCV.

[6]  Antonio Torralba,et al.  Small codes and large image databases for recognition , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[7]  Trevor Darrell,et al.  Nearest-Neighbor Methods in Learning and Vision , 2008, IEEE Trans. Neural Networks.

[8]  Martial Hebert,et al.  Rapid object indexing using locality sensitive hashing and joint 3D-signature space estimation , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[10]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[11]  Yan Ke,et al.  Efficient Near-duplicate Detection and Sub-image Retrieval , 2004 .

[12]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[13]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[14]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[15]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[16]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.