论文信息 - A Hybrid Clustering Criterion for R*-Tree on Business Data

A Hybrid Clustering Criterion for R*-Tree on Business Data

It is well-known that multidimensional indices are efficient to improve the query performance on relational data. As one successful multi-dimensional index structure, R*-tree, a famous member of the R-tree family, is very popular. The clustering pattern of the objects (i.e., tuples in relational tables) among R*-tree leaf nodes is one of the deceive factors on performance of range queries, a popular kind of queries on business data. Then, how is the clustering pattern formed? In this paper, we point out that the insert algorithm of R*tree, especially, its clustering criterion of choosing subtrees for new coming objects, determines the clustering pattern of the tuples among the leaf nodes. According to our discussion and observations, it becomes clear that the present clustering criterion of R*-tree can not lead to a good clustering pattern of tuples when R*-tree is applied to business data, which greatly degrades query performance. After that, a hybrid clustering criterion for the insert algorithm of R*-tree is introduced. Our discussion and experiments indicate that query performance of R*-tree on business data is improved clearly by the hybrid criterion.

Yaokai Feng | Akifumi Makinouchi | Zhibin Wang

[1] Yaokai Feng,et al. Improving Query Performance on OLAP-Data Using Enhanced Multidimensional Indices , 2004, ICEIS.

[2] Nick Roussopoulos,et al. An alternative storage organization for ROLAP aggregate views based on cubetrees , 1998, SIGMOD '98.

[3] Ju-Hong Lee,et al. Dynamic Update Cube for Range-sum Queries , 2001, VLDB.

[4] Hans-Peter Kriegel,et al. The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[5] Seokjin Hong,et al. Efficient Execution of Range-Aggregate Queries in Data Warehouse Environments , 2001, ER.

[6] Dimitris Papadias,et al. Algorithms for Querying by Spatial Structure , 1998, VLDB.

[7] Petra Perner,et al. Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[8] Nick Roussopoulos,et al. Cubetree: organization of and bulk incremental updates on the data cube , 1997, SIGMOD '97.

[9] Hans-Joachim Lenz,et al. The R/sub a/*-tree: an improved R*-tree with materialized data for supporting range queries on OLAP-data , 1998, Proceedings Ninth International Workshop on Database and Expert Systems Applications (Cat. No.98EX130).

[10] Divesh Srivastava,et al. On effective multi-dimensional indexing for strings , 2000, SIGMOD '00.