Locality-Sensitive Hashing for Chi2 Distance

In the past 10 years, new powerful algorithms based on efficient data structures have been proposed to solve the problem of Nearest Neighbors search (or Approximate Nearest Neighbors search). If the Euclidean Locality Sensitive Hashing algorithm, which provides approximate nearest neighbors in a euclidean space with sublinear complexity, is probably the most popular, the euclidean metric does not always provide as accurate and as relevant results when considering similarity measure as the Earth-Mover Distance and χ2 distances. In this paper, we present a new LSH scheme adapted to χ2 distance for approximate nearest neighbors search in high-dimensional spaces. We define the specific hashing functions, we prove their local-sensitivity, and compare, through experiments, our method with the Euclidean Locality Sensitive Hashing algorithm in the context of image retrieval on real image databases. The results prove the relevance of such a new LSH scheme either providing far better accuracy in the context of image retrieval than euclidean scheme for an equivalent speed, or providing an equivalent accuracy but with a high gain in terms of processing speed.

[1]  Marcel Worring,et al.  Content-Based Image Retrieval at the End of the Early Years , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  M. Greenacre Correspondence analysis in practice , 1993 .

[3]  Zhe Wang,et al.  Multi-Probe LSH: Efficient Indexing for High-Dimensional Similarity Search , 2007, VLDB.

[4]  M. Greenacre Correspondence Analysis in Practice, Second Edition , 2007 .

[5]  Matthieu Cord,et al.  Fast approximate kernel-based similarity search for image retrieval task , 2008, 2008 19th International Conference on Pattern Recognition.

[6]  Yan Ke,et al.  Efficient Near-duplicate Detection and Sub-image Retrieval , 2004 .

[7]  Edward Y. Chang,et al.  Support Vector Machine Concept-Dependent Active Learning for Image Retrieval , 2005 .

[8]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Matthieu Cord,et al.  Scalable active learning strategy for object category retrieval , 2010, 2010 IEEE International Conference on Image Processing.

[10]  Ilan Shimshoni,et al.  Mean shift based clustering in high dimensions: a texture classification example , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[11]  Matthieu Cord,et al.  Combining visual dictionary, kernel-based similarity and learning strategy for image category retrieval , 2008, Comput. Vis. Image Underst..

[12]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[13]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[14]  Laurent Amsaleg,et al.  NV-Tree: An Efficient Disk-Based Index for Approximate Search in Very Large High-Dimensional Collections , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Hanan Samet,et al.  Foundations of multidimensional and metric data structures , 2006, Morgan Kaufmann series in data management systems.

[16]  David G. Lowe,et al.  Fast Approximate Nearest Neighbors with Automatic Algorithm Configuration , 2009, VISAPP.

[17]  Patrick Haffner,et al.  Support vector machines for histogram-based image classification , 1999, IEEE Trans. Neural Networks.

[18]  Antonio Torralba,et al.  Spectral Hashing , 2008, NIPS.

[19]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[20]  Eric R. Ziegel,et al.  The Elements of Statistical Learning , 2003, Technometrics.

[21]  Alexandr Andoni,et al.  Near-Optimal Hashing Algorithms for Approximate Nearest Neighbor in High Dimensions , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).