Overcoming the Curse of Dimensionality

We study the behavior of pivot-based algorithms for similarity searching in metric spaces. We show that they are eeective tools for intrinsically high-dimensional spaces, and that their performance is basically dependent on the number of pivots used and the precision used to store the distances. In this paper we give a simple yet eeective recipe for practitioners seeking for a black-box method to plug in their applications. Besides, we introduce a new indexing algorithm that gives the minimum overall CPU search time for a given amount of memory, compared with other state-of-the-art approaches.

[1]  Walter A. Burkhard,et al.  Some approaches to best-match file searching , 1973, Commun. ACM.

[2]  Gonzalo Navarro Searching in metric spaces by spatial approximation , 2002, The VLDB Journal.

[3]  Sameer A. Nene,et al.  A simple algorithm for nearest neighbor search in high dimensions , 1997 .

[4]  Marvin B. Shapiro The choice of reference points in best-match file searching , 1977, CACM.

[5]  Ricardo A. Baeza-Yates,et al.  Proximity Matching Using Fixed-Queries Trees , 1994, CPM.

[6]  Peter Yianilos,et al.  Excluded middle vantage point forests for nearest neighbor search , 1998 .

[7]  Ricardo A. Baeza-Yates,et al.  Spaghettis: an array based algorithm for similarity queries in metric spaces , 1999, 6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268).

[8]  Jon Louis Bentley,et al.  Multidimensional Binary Search Trees in Database Applications , 1979, IEEE Transactions on Software Engineering.

[9]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[10]  Z. Meral Özsoyoglu,et al.  Distance-based indexing for high-dimensional metric spaces , 1997, SIGMOD '97.

[11]  Ricardo A. Baeza-Yates,et al.  Fast approximate string matching in a dictionary , 1998, Proceedings. String Processing and Information Retrieval: A South American Symposium (Cat. No.98EX207).

[12]  Luisa Micó,et al.  A new version of the nearest-neighbour approximating and eliminating search algorithm (AESA) with linear preprocessing time and memory requirements , 1994, Pattern Recognit. Lett..

[13]  Sergey Brin,et al.  Near Neighbor Search in Large Metric Spaces , 1995, VLDB.

[14]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[15]  Franz Aurenhammer,et al.  Voronoi diagrams—a survey of a fundamental geometric data structure , 1991, CSUR.

[16]  Peter N. Yianilos,et al.  Data structures and algorithms for nearest neighbor search in general metric spaces , 1993, SODA '93.