Improving the R*-tree with outlier handling techniques

The R*-tree, as a state-of-the-art spatial index, has already found its way into commercial systems like Oracle. In this paper, we aim at improving query performance of the R*-tree. We focus on five widely used spatial queries: range query, aggregation query, nearest neighbor query, skyline query, and join query. The idea is to store outlier objects in internal tree nodes. The new structure is named the ROtree. Here an outlier is an object which is located far from other objects or has large extent (we consider both point objects and objects with extent). If such objects are stored at higher levels of the tree, the lower-level nodes have smaller minimum bounding rectangles and thus the index performs better. To support the dynamic nature of the index, several structural and algorithmic changes are needed. The paper discusses these changes. In particular, we show how to identify and handle the outlier objects during page overflow/underflow, using gain/loss metrics. Extensive experiments reveal that the ROtree significantly outperforms the R*-tree for all the five queries.

[1]  Ambuj K. Singh,et al.  Indexing non-uniform spatial data , 1997, Proceedings of the 1997 International Database Engineering and Applications Symposium (Cat. No.97TB100166).

[2]  Kothuri Venkata Ravi Kanth,et al.  Quadtree and R-tree indexes in oracle spatial: a comparison using GIS data , 2002, SIGMOD '02.

[3]  Elke A. Rundensteiner,et al.  Spatial Joins Using R-trees: Breadth-First Traversal with Global Optimizations , 1997, VLDB.

[4]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.

[5]  Christian Böhm,et al.  Searching in high-dimensional spaces: Index structures for improving the performance of multimedia databases , 2001, CSUR.

[6]  Nick Roussopoulos,et al.  Nearest neighbor queries , 1995, SIGMOD '95.

[7]  Jeffrey Scott Vitter,et al.  External memory algorithms and data structures: dealing with massive data , 2001, CSUR.

[8]  Tian Xia,et al.  A novel improvement to the R*-tree spatial index using gain/loss metrics , 2004, GIS '04.

[9]  Hans-Peter Kriegel,et al.  Efficient processing of spatial joins using R-trees , 1993, SIGMOD Conference.

[10]  Panos Kalnis,et al.  Efficient OLAP Operations in Spatial Data Warehouses , 2001, SSTD.

[11]  Antonin Guttman,et al.  R-trees: a dynamic index structure for spatial searching , 1984, SIGMOD '84.

[12]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[13]  Yannis Manolopoulos,et al.  Closest pair queries in spatial databases , 2000, SIGMOD '00.

[14]  Mark de Berg,et al.  The Priority R-tree: a practically efficient and worst-case optimal R-tree , 2004, SIGMOD '04.

[15]  Bernhard Seeger,et al.  An optimal and progressive algorithm for skyline queries , 2003, SIGMOD '03.

[16]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[17]  Ning An,et al.  Improving Performance with Bulk-Inserts in Oracle R-Trees , 2003, VLDB.