Parallel multi-dimensional ROLAP indexing

This paper addresses the query performance issue for Relational OLAP (ROLAP) datacubes. We present a distributed multi-dimensional ROLAP indexing scheme which is practical to implement, requires only a small communication volume, and is fully adapted to distributed disks. Our solution is efficient for spatial searches in high dimensions and scalable in terms of data sizes, dimensions, and number of processors. Our method is also incrementally maintainable. Using "surrogate" group-bys, it allows for the efficient processing of arbitrary OLAP queries on partial cubes, where not all of the group-bys have been materialized. Our experiments show that the ROLAP advantage of better scalability, in comparison to MOLAP can be maintained while providing, at the same time, a fast and flexible index for OLAP queries.

[1]  Christos Faloutsos,et al.  On packing R-trees , 1993, CIKM '93.

[2]  RamakrishnanRaghu,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999 .

[3]  Andrew Rau-Chaplin,et al.  A cluster architecture for parallel data warehousing , 2001, Proceedings First IEEE/ACM International Symposium on Cluster Computing and the Grid.

[4]  Hans-Joachim Lenz,et al.  The R/sub a/*-tree: an improved R*-tree with materialized data for supporting range queries on OLAP-data , 1998, Proceedings Ninth International Workshop on Database and Expert Systems Applications (Cat. No.98EX130).

[5]  Christos Faloutsos,et al.  Declustering Spatial Databases on a Multi-Computer Architecture , 1996, EDBT.

[6]  Raghu Ramakrishnan,et al.  Bottom-up computation of sparse and Iceberg CUBE , 1999, SIGMOD '99.

[7]  Nick Roussopoulos,et al.  Direct spatial search on pictorial databases using packed R-trees , 1985, SIGMOD Conference.

[8]  Christos Faloutsos,et al.  Fractals for secondary key retrieval , 1989, PODS.

[9]  Matthew Huras,et al.  Multi-dimensional clustering: a new data layout scheme in DB2 , 2003, SIGMOD '03.

[10]  Nick Roussopoulos,et al.  Cubetree: organization of and bulk incremental updates on the data cube , 1997, SIGMOD '97.

[11]  Volker Markl,et al.  Integrating the UB-Tree into a Database System Kernel , 2000, VLDB.

[12]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[13]  Scott T. Leutenegger,et al.  Master-client R-trees: a new parallel R-tree architecture , 1999, Proceedings. Eleventh International Conference on Scientific and Statistical Database Management.

[14]  Jonathan Schaeffer,et al.  On the Versatility of Parallel Sorting by Regular Sampling , 1993, Parallel Comput..

[15]  Jeffrey D. Ullman,et al.  Index selection for OLAP , 1997, Proceedings 13th International Conference on Data Engineering.

[16]  Ying Chen,et al.  PnP: parallel and external memory iceberg cube computation , 2005, 21st International Conference on Data Engineering (ICDE'05).

[17]  Jeffrey F. Naughton,et al.  On the Computation of Multidimensional Aggregates , 1996, VLDB.

[18]  Sunita Sarawagi,et al.  On computing the data cube , 1996 .

[19]  Susanne E. Hambrusch,et al.  Parallelizing the Data Cube , 2001, Distributed and Parallel Databases.

[20]  Kenneth A. Ross,et al.  Fast Computation of Sparse Datacubes , 1997, VLDB.

[21]  Hamid Pirahesh,et al.  Data Cube: A Relational Aggregation Operator Generalizing Group-By, Cross-Tab, and Sub-Totals , 1996, Data Mining and Knowledge Discovery.

[22]  Ying Chen,et al.  Parallel ROLAP Data Cube Construction on Shared-Nothing Multiprocessors , 2004, Distributed and Parallel Databases.

[23]  Alok N. Choudhary,et al.  High Performance OLAP and Data Mining on Parallel Computers , 1997, Data Mining and Knowledge Discovery.

[24]  Yannis Sismanis,et al.  Dwarf: shrinking the PetaCube , 2002, SIGMOD '02.