Efficient Indexing of High-Dimensional Data Through Dimensionality Reduction

Abstract The performance of the R-tree indexing method is known to deteriorate rapidly when the dimensionality of data increases. In this paper, we present a technique for dimensionality reduction by grouping d distinct attributes into k disjoint clusters and mapping each cluster to a linear space. The resulting k-dimensional space (which may be much smaller than d) can then be indexed using an R-tree efficiently. We present algorithms for decomposing a query region on the native d-dimensional space to corresponding query regions in the k-dimensional space, as well as search and update operations for the “dimensionally-reduced” R-tree. Experiments using real data sets for point, region, and OLAP queries were conducted. The results indicate that there is potential for significant performance gains over a naive strategy in which an R-tree index is created on the native d-dimensional space.

[1]  Christos Faloutsos,et al.  Searching Multimedia Databases by Content , 1996, Advances in Database Systems.

[2]  Christos Faloutsos,et al.  Fractals for secondary key retrieval , 1989, PODS.

[3]  H. V. Jagadzsh Linear Clustering of Objects with Multiple Attributes , 1998 .

[4]  Christos Faloutsos,et al.  FastMap: a fast algorithm for indexing, data-mining and visualization of traditional and multimedia datasets , 1995, SIGMOD '95.

[5]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[6]  Christos Faloutsos,et al.  The R+-Tree: A Dynamic Index for Multi-Dimensional Objects , 1987, VLDB.

[7]  David B. Lomet,et al.  A review of recent work on multi-attribute access methods , 1992, SGMD.

[8]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[9]  Christos Faloutsos,et al.  Multiattribute hashing using Gray codes , 1986, SIGMOD '86.

[10]  Nick Roussopoulos,et al.  Cubetree: organization of and bulk incremental updates on the data cube , 1997, SIGMOD '97.

[11]  Elisa Bertino,et al.  Indexing Techniques for Advanced Database Systems , 1997, The Springer International Series on Advances in Database Systems.

[12]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[13]  Beng Chin Ooi,et al.  Extensible Buffer Management of Indexes , 1992, VLDB.

[14]  Christos Faloutsos,et al.  Analysis of the Clustering Properties of the Hilbert Space-Filling Curve , 2001, IEEE Trans. Knowl. Data Eng..

[15]  Jack A. Orenstein Spatial query processing in an object-oriented database system , 1986, SIGMOD '86.