Locality-preserving hashing in multidimensional spaces

We consider localitg-preserving hashing — in which adjacent points in the domain are mapped to adjacent or nearlyadjacent points in the range — when the domain is a ddimensional cube. This problem has applications to highdimensional search and multimedia indexing. We show that simple and natural classes of hash functions are provably good for this problem. We complement this with lower bounds suggesting that our results are essentially the best possible.

[1]  Allen Gersho,et al.  Vector quantization and signal compression , 1991, The Kluwer international series in engineering and computer science.

[2]  Gerald Salton,et al.  Automatic text processing , 1988 .

[3]  Jon M. Kleinberg,et al.  Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[4]  Axthonv G. Oettinger,et al.  IEEE Transactions on Information Theory , 1998 .

[5]  K. Wakimoto,et al.  Efficient and Effective Querying by Image Content , 1994 .

[6]  J. Matoussek Reporting points in halfspaces , 1991, FOCS 1991.

[7]  Kari Karhunen,et al.  Über lineare Methoden in der Wahrscheinlichkeitsrechnung , 1947 .

[8]  Kenneth L. Clarkson,et al.  A Randomized Algorithm for Closest-Point Queries , 1988, SIAM J. Comput..

[9]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[10]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[11]  Donald E. Knuth,et al.  The art of computer programming, volume 3: (2nd ed.) sorting and searching , 1998 .

[12]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[13]  Richard J. Lipton,et al.  Multidimensional Searching Problems , 1976, SIAM J. Comput..

[14]  Andrew Chi-Chih Yao,et al.  A general approach to d-dimensional geometric queries , 1985, STOC '85.

[15]  H. Hotelling Analysis of a complex of statistical variables into principal components. , 1933 .

[16]  Alex Pentland,et al.  Photobook: tools for content-based manipulation of image databases , 1994, Electronic Imaging.

[17]  Robert Tibshirani,et al.  Discriminant Adaptive Nearest Neighbor Classification , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[18]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[19]  E. Kreyszig Introductory Functional Analysis With Applications , 1978 .

[20]  Chris Buckley,et al.  New Retrieval Approaches Using SMART: TREC 4 , 1995, TREC.

[21]  Donald E. Knuth,et al.  The art of computer programming: sorting and searching (volume 3) , 1973 .

[22]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[23]  Ketan Mulmuley Randomized multidimensional search trees: further results in dynamic sampling , 1991, [1991] Proceedings 32nd Annual Symposium of Foundations of Computer Science.

[24]  Jirí Matousek,et al.  Ray shooting and parametric search , 1992, STOC '92.

[25]  Jirí Matousek,et al.  Reporting Points in Halfspaces , 1992, Comput. Geom..

[26]  Arnold W. M. Smeulders,et al.  Image Databases and Multi-Media Search , 1998, Image Databases and Multi-Media Search.

[27]  S. Meiser,et al.  Point Location in Arrangements of Hyperplanes , 1993, Inf. Comput..

[28]  Peter E. Hart,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[29]  L. Devroye,et al.  8 Nearest neighbor methods in discrimination , 1982, Classification, Pattern Recognition and Reduction of Dimensionality.

[30]  Ori Sasson,et al.  Non-Expansive Hashing , 1996, STOC '96.