Generalized multidimensional data mapping and query processing

Multidimensional data points can be mapped to one-dimensional space to exploit single dimensional indexing structures such as the B+-tree. In this article we present a Generalized structure for data Mapping and query Processing (GiMP), which supports extensible mapping methods and query processing. GiMP can be easily customized to behave like many competent indexing mechanisms for multi-dimensional indexing, such as the UB-Tree, the Pyramid technique, the iMinMax, and the iDistance. Besides being an extendible indexing structure, GiMP also serves as a framework to study the characteristics of the mapping and hence the efficiency of the indexing scheme. Specifically, we introduce a metric called mapping redundancy to characterize the efficiency of a mapping method in terms of disk page accesses and analyze its behavior for point, range and kNN queries. We also address the fundamental problem of whether an efficient mapping exists and how to define such a mapping for a given data set.

[1]  Cheng-Shang Chang Calculus , 2020, Bicycle or Unicycle?.

[2]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[3]  T. H. Merrett,et al.  A class of data structures for associative searching , 1984, PODS.

[4]  Hanan Samet,et al.  Ranking in Spatial Databases , 1995, SSD.

[5]  Ramesh C. Jain,et al.  Similarity indexing with the SS-tree , 1996, Proceedings of the Twelfth International Conference on Data Engineering.

[6]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[7]  Beng Chin Ooi,et al.  iDistance: An adaptive B+-tree based indexing method for nearest neighbor search , 2005, TODS.

[8]  Bernhard Seeger,et al.  XXL - A Library Approach to Supporting Efficient Implementations of Advanced Database Queries , 2001, VLDB.

[9]  Alberto O. Mendelzon,et al.  Similarity-based queries for time series data , 1997, SIGMOD '97.

[10]  Anand Sivasubramaniam,et al.  Analyzing range queries on spatial data , 2000, Proceedings of 16th International Conference on Data Engineering (Cat. No.00CB37073).

[11]  Hans-Peter Kriegel,et al.  The pyramid-technique: towards breaking the curse of dimensionality , 1998, SIGMOD '98.

[12]  Beng Chin Ooi,et al.  Indexing the Distance: An Efficient Method to KNN Processing , 2001, VLDB.

[13]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[14]  Christos Faloutsos,et al.  Fractals for secondary key retrieval , 1989, PODS.

[15]  Gerhard Weikum,et al.  ACM Transactions on Database Systems , 2005 .

[16]  Beng Chin Ooi,et al.  Indexing the edges—a simple and yet efficient approach to high-dimensional indexing , 2000, PODS.

[17]  Christos H. Papadimitriou,et al.  On the analysis of indexing schemes , 1997, PODS '97.

[18]  Hans-Peter Kriegel,et al.  Optimal multi-step k-nearest neighbor search , 1998, SIGMOD '98.

[19]  R. A. Silverman,et al.  Introductory Real Analysis , 1972 .

[20]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[21]  Christos Faloutsos,et al.  Beyond uniformity and independence: analysis of R-trees using the concept of fractal dimension , 1994, PODS.

[22]  Christian Böhm,et al.  A cost model for nearest neighbor search in high-dimensional data space , 1997, PODS.

[23]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[24]  Christos Faloutsos,et al.  Analysis of the Clustering Properties of the Hilbert Space-Filling Curve , 2001, IEEE Trans. Knowl. Data Eng..

[25]  Jack A. Orenstein Spatial query processing in an object-oriented database system , 1986, SIGMOD '86.

[26]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[27]  Christian Böhm,et al.  A cost model for query processing in high dimensional data spaces , 2000, TODS.

[28]  Christian Bohm,et al.  A cost model for query processing in high dimensional data spaces , 2000 .

[29]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[30]  Volker Markl,et al.  Integrating the UB-Tree into a Database System Kernel , 2000, VLDB.

[31]  Jeffrey F. Naughton,et al.  Generalized Search Trees for Database Systems , 1995, VLDB.

[32]  Rudolf Bayer,et al.  The Universal B-Tree for Multidimensional Indexing: general Concepts , 1997, WWCA.

[33]  Eddie Kohler,et al.  The Click modular router , 1999, SOSP.

[34]  Christos Faloutsos,et al.  Fast subsequence matching in time-series databases , 1994, SIGMOD '94.

[35]  J. V. Tucker,et al.  Sets: Naïve, Axiomatic and Applied , 1979 .