A geometric framework for modelling similarity search

We suggest a geometric framework for modelling similarity search in large and multidimensional data spaces of general nature, formed by the concept of the similarity workload, which is a probability metric space /spl Omega/ (query domain) with a distinguished finite subspace X (dataset), together with an assembly of concepts, techniques, and results from metric geometry. As some of the latter are being currently reinvented by the database community, it seems desirable to try and bridge the gap between database research and the relevant work already done in geometry and analysis.

[1]  Vladimir Pestov,et al.  On the geometry of similarity search: Dimensionality curse and concentration of measure , 1999, Inf. Process. Lett..

[2]  SetsyWei Wang,et al.  PK-tree : A Dynamic Spatial Index Structure for Large Data , 1997 .

[3]  J. Steele Probability theory and combinatorial optimization , 1987 .

[4]  M. Gromov,et al.  A topological application of the isoperimetric inequality , 1983 .

[5]  I. J. Schoenberg,et al.  Metric spaces and positive definite functions , 1938 .

[6]  K. Wakimoto,et al.  Efficient and Effective Querying by Image Content , 1994 .

[7]  Pavel Zezula,et al.  A cost model for similarity queries in metric spaces , 1998, PODS '98.

[8]  Jon Louis Bentley,et al.  Multidimensional binary search trees used for associative searching , 1975, CACM.

[9]  James Lee Hafner,et al.  Efficient Color Histogram Indexing for Quadratic Form Distance Functions , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  M. Talagrand A new look at independence , 1996 .

[11]  G. A. Garrett,et al.  Characterization of Spherical and Pseudo-Spherical Sets of Points , 1933 .

[12]  Christos H. Papadimitriou Database metatheory: asking the big queries , 1995, PODS '95.

[13]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.

[14]  References , 1971 .

[15]  L. Hanin Kantorovich-Rubinstein norm and its application in the theory of Lipschitz spaces , 1992 .

[16]  M. Talagrand Huge random structures and mean field models for spin glasses. , 1998 .

[17]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[18]  Michel Deza,et al.  Geometry of cuts and metrics , 2009, Algorithms and combinatorics.

[19]  G. Schechtman Lévy type inequality for a class of finite metric spaces , 1982 .

[20]  David Eppstein,et al.  On Nearest-Neighbor Graphs , 1992, ICALP.

[21]  Jim Freeman Probability Metrics and the Stability of Stochastic Models , 1991 .

[22]  Sergey Brin,et al.  Near Neighbor Search in Large Metric Spaces , 1995, VLDB.

[23]  Hans-Jörg Schek,et al.  A Quantitative Analysis and Performance Study for Similarity-Search Methods in High-Dimensional Spaces , 1998, VLDB.

[24]  Christos H. Papadimitriou,et al.  On the analysis of indexing schemes , 1997, PODS '97.

[25]  Sameer A. Nene,et al.  A simple algorithm for nearest neighbor search in high dimensions , 1997 .

[26]  I. J. Schoenberg On Certain Metric Spaces Arising From Euclidean Spaces by a Change of Metric and Their Imbedding in Hilbert Space , 1937 .

[27]  Dennis Shasha,et al.  New techniques for best-match retrieval , 1990, TOIS.

[28]  Hans-Peter Kriegel,et al.  The pyramid-technique: towards breaking the curse of dimensionality , 1998, SIGMOD '98.