The pyramid-technique: towards breaking the curse of dimensionality

In this paper, we propose the Pyramid-Technique, a new indexing method for high-dimensional data spaces. The Pyramid-Technique is highly adapted to range query processing using the maximum metric Lmax. In contrast to all other index structures, the performance of the Pyramid-Technique does not deteriorate when processing range queries on data of higher dimensionality. The Pyramid-Technique is based on a special partitioning strategy which is optimized for high-dimensional data. The basic idea is to divide the data space first into 2d pyramids sharing the center point of the space as a top. In a second step, the single pyramids are cut into slices parallel to the basis of the pyramid. These slices from the data pages. Furthermore, we show that this partition provides a mapping from the given d-dimensional space to a 1-dimensional space. Therefore, we are able to use a B+-tree to manage the transformed data. As an analytical evaluation of our technique for hypercube range queries and uniform data distribution shows, the Pyramid-Technique clearly outperforms index structures using other partitioning strategies. To demonstrate the practical relevance of our technique, we experimentally compared the Pyramid-Technique with the X-tree, the Hilbert R-tree, and the Linear Scan. The results of our experiments using both, synthetic and real data, demonstrate that the Pyramid-Technique outperforms the X-tree and the Hilbert R-tree by a factor of up to 14 (number of page accesses) and up to 2500 (total elapsed time) for range queries.

[1]  A. Desai Narasimhalu,et al.  Multimedia databases , 1996, Multimedia Systems.

[2]  Hanan Samet,et al.  Ranking in Spatial Databases , 1995, SSD.

[3]  Ramesh C. Jain,et al.  Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[4]  K. Wakimoto,et al.  Efficient and Effective Querying by Image Content , 1994 .

[5]  Christian Böhm,et al.  Fast parallel similarity search in multimedia databases , 1997, SIGMOD '97.

[6]  Surajit Chaudhuri,et al.  Data Warehousing and OLAP for Decision Support (Tutorial) , 1997, SIGMOD Conference.

[7]  Christian Böhm,et al.  Improving the Query Performance of High-Dimensional Index Structures by Bulk-Load Operations , 1998, EDBT.

[8]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.

[9]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[10]  Hans-Peter Kriegel,et al.  Fast nearest neighbor search in high-dimensional space , 1998, Proceedings 14th International Conference on Data Engineering.

[11]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[12]  Surajit Chaudhuri,et al.  Data warehousing and OLAP for decision support , 1997, SIGMOD '97.

[13]  Christos Faloutsos,et al.  Declustering using fractals , 1993, [1993] Proceedings of the Second International Conference on Parallel and Distributed Information Systems.

[14]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[15]  Jonathan Goldstein,et al.  When Is ''Nearest Neighbor'' Meaningful? , 1999, ICDT.

[16]  Christian Böhm,et al.  Optimal Multidimensional Query Processing Using Tree Striping , 2000, DaWaK.

[17]  Rajiv Mehrotra,et al.  Feature-based retrieval of similar shapes , 1993, Proceedings of IEEE 9th International Conference on Data Engineering.

[18]  J. T. Robinson,et al.  The K-D-B-tree: a search structure for large multidimensional dynamic indexes , 1981, SIGMOD '81.

[19]  Hans-Peter Kriegel,et al.  Efficient User-Adaptable Similarity Search in Large Multimedia Databases , 1997, VLDB.

[20]  Christian Böhm,et al.  A cost model for nearest neighbor search in high-dimensional data space , 1997, PODS.

[21]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[22]  H. V. Jagadish,et al.  A retrieval technique for similar shapes , 1991, SIGMOD '91.

[23]  Ramesh C. Jain,et al.  Similarity indexing: algorithms and performance , 1996, Electronic Imaging.

[24]  R. Bayer,et al.  Organization and maintenance of large ordered indices , 1970, SIGFIDET '70.