Probabilistic proximity searching algorithms based on compact partitions

The main bottleneck of the research in metric space searching is the so-called curse of dimensionality, which makes the task of searching some metric spaces intrinsically difficult, whatever algorithm is used. A recent trend to break this bottleneck resorts to probabilistic algorithms, where it has been shown that one can find 99% of the relevant objects at a fraction of the cost of the exact algorithm. These algorithms are welcome in most applications because resorting to metric space searching already involves a fuzziness in the retrieval requirements. In this paper, we push further in this direction by developing probabilistic algorithms on data structures whose exact versions are the best for high dimensions. As a result, we obtain probabilistic algorithms that are better than the previous ones. We give new insights on the problem and propose a novel view based on time-bounded searching. We also propose an experimental framework for probabilistic algorithms that permits comparing them in offline mode.

[1]  Iraj Kalantari,et al.  A Data Structure and an Algorithm for the Nearest Point Problem , 1983, IEEE Transactions on Software Engineering.

[2]  Pavel Zezula,et al.  Approximate similarity retrieval with M-trees , 1998, The VLDB Journal.

[3]  Donna K. Harman,et al.  Overview of the Second Text REtrieval Conference (TREC-2) , 1994, HLT.

[4]  Raj Jain,et al.  Algorithms and strategies for similarity retrieval , 1996 .

[5]  Jeffrey K. Uhlmann,et al.  Satisfying General Proximity/Similarity Queries with Metric Trees , 1991, Inf. Process. Lett..

[6]  Sunil Arya,et al.  An optimal algorithm for approximate nearest neighbor searching fixed dimensions , 1998, JACM.

[7]  Gonzalo Navarro Searching in metric spaces by spatial approximation , 2002, The VLDB Journal.

[8]  F. DEHNE,et al.  Voronoi trees and clustering problems , 1987, Inf. Syst..

[9]  Gonzalo Navarro,et al.  Probabilistic proximity search: Fighting the curse of dimensionality in metric spaces , 2003, Inf. Process. Lett..

[10]  Sergey Brin,et al.  Near Neighbor Search in Large Metric Spaces , 1995, VLDB.

[11]  Hartmut Noltemeier,et al.  Monotonous Bisector* Trees - A Tool for Efficient Partitioning of Complex Scenes of Geometric Objects , 1992, Data Structures and Efficient Algorithms.

[12]  Gonzalo Navarro,et al.  An effective clustering algorithm to index high dimensional metric spaces , 2000, Proceedings Seventh International Symposium on String Processing and Information Retrieval. SPIRE 2000.

[13]  Sunil Arya,et al.  Approximate range searching , 1995, SCG '95.

[14]  Jonathan Goldstein,et al.  Contrast Plots and P-Sphere Trees: Space vs. Time in Nearest Neighbour Searches , 2000, VLDB.

[15]  Peter N. Yianilos,et al.  Locally lifting the curse of dimensionality for nearest neighbor search (extended abstract) , 2000, SODA '00.

[16]  Ricardo A. Baeza-Yates,et al.  Searching in metric spaces , 2001, CSUR.

[17]  Kenneth L. Clarkson,et al.  Nearest Neighbor Queries in Metric Spaces , 1999, Discret. Comput. Geom..

[18]  Pavel Zezula,et al.  M-tree: An Efficient Access Method for Similarity Search in Metric Spaces , 1997, VLDB.

[19]  Marco Patella,et al.  PAC nearest neighbor queries: Approximate and controlled search in high-dimensional and metric spaces , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).