Efficient evaluation of partially-dimensional range queries in large OLAP datasets

In light of the increasing requirement for processing multidimensional queries on OLAP (relational) data, the database community has focused on the queries (especially range queries) on the large OLAP datasets from the view of multidimensional data. It is well-known that multidimensional indices are helpful to improve the performance of such queries. However, we found that much information irrelevant to queries also has to be read from disk if the existing multidimensional indices are used with OLAP data, which greatly degrade the search performance. This problem comes from particularity on the actual queries exerted on OLAP data. That is, in many OLAP applications, the query conditions probably are only with partial dimensions (not all) of the whole index space. Such range queries are called partially-dimensional (PD) range queries in this study. Based on R*-tree, we propose a new index structure, called AR*-tree, to counter the actual queries on OLAP data. The results of both mathematical analysis and many experiments with different datasets indicate that the AR*-tree can clearly improve the performance of PD range queries, esp. for large OLAP datasets.

[1]  Volker Markl,et al.  Integrating the UB-Tree into a Database System Kernel , 2000, VLDB.

[2]  Seokjin Hong,et al.  Efficient Execution of Range-Aggregate Queries in Data Warehouse Environments , 2001, ER.

[3]  Volker Markl,et al.  Improving OLAP performance by multidimensional hierarchical clustering , 1999, Proceedings. IDEAS'99. International Database Engineering and Applications Symposium (Cat. No.PR00265).

[4]  Hanan Samet,et al.  Distance browsing in spatial databases , 1999, TODS.

[5]  Hans-Peter Kriegel,et al.  The X-tree : An Index Structure for High-Dimensional Data , 2001, VLDB.

[6]  Rudolf Bayer,et al.  The Universal B-Tree for Multidimensional Indexing: general Concepts , 1997, WWCA.

[7]  Masatoshi Yoshikawa,et al.  The A-tree: An Index Structure for High-Dimensional Spaces Using Relative Approximation , 2000, VLDB.

[8]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[9]  Alfredo Cuzzocrea,et al.  An OLAM-Based Framework for Complex Knowledge Pattern Discovery in Distributed-and-Heterogeneous-Data-Sources and Cooperative Information Systems , 2007, DaWaK.

[10]  David W. Adler DB2 Spatial Extender - Spatial data within the RDBMS , 2001, VLDB.

[11]  Oliver Günther,et al.  Multidimensional access methods , 1998, CSUR.

[12]  Divesh Srivastava,et al.  On effective multi-dimensional indexing for strings , 2000, SIGMOD '00.

[13]  Shin'ichi Satoh,et al.  The SR-tree: an index structure for high-dimensional nearest neighbor queries , 1997, SIGMOD '97.

[14]  Christian Böhm,et al.  Optimal Multidimensional Query Processing Using Tree Striping , 2000, DaWaK.

[15]  Cui Yu,et al.  High-Dimensional Indexing , 2002, Lecture Notes in Computer Science.

[16]  Jignesh M. Patel,et al.  Efficient Evaluation of Radial Queries using the Target Tree , 2005, 21st International Conference on Data Engineering Workshops (ICDEW'05).

[17]  David J. DeWitt,et al.  On supporting containment queries in relational database management systems , 2001, SIGMOD '01.

[18]  Anastasia Ailamaki,et al.  Clotho: Decoupling memory page layout from storage organization , 2004, VLDB.

[19]  Matthew Huras,et al.  Efficient Query Processing for Multi-Dimensionally Clustered Tables in DB2 , 2003, VLDB.

[20]  Hanan Samet,et al.  Implementing ray tracing with octrees and neighbor finding , 1989, Comput. Graph..

[21]  Christian Böhm,et al.  High Dimensional Indexing , 2009, Encyclopedia of Database Systems.