A Hybrid Clustering Criterion for R*-Tree on Business Data

It is well-known that multidimensional indices are efficient to improve the query performance on relational data. As one successful multi-dimensional index structure, R*-tree, a famous member of the R-tree family, is very popular. The clustering pattern of the objects (i.e., tuples in relational tables) among R*-tree leaf nodes is one of the deceive factors on performance of range queries, a popular kind of queries on business data. Then, how is the clustering pattern formed? In this paper, we point out that the insert algorithm of R*tree, especially, its clustering criterion of choosing subtrees for new coming objects, determines the clustering pattern of the tuples among the leaf nodes. According to our discussion and observations, it becomes clear that the present clustering criterion of R*-tree can not lead to a good clustering pattern of tuples when R*-tree is applied to business data, which greatly degrades query performance. After that, a hybrid clustering criterion for the insert algorithm of R*-tree is introduced. Our discussion and experiments indicate that query performance of R*-tree on business data is improved clearly by the hybrid criterion.