Range queries in OLAP data cubes

A range query applies an aggregation operation over all selected cells of an OLAP data cube where the selection is specified by providing ranges of values for numeric dimensions. We present fast algorithms for range queries for two types of aggregation operations: SUM and MAX. These two operations cover techniques required for most popular aggregation operations, such as those supported by SQL. For range-sum queries, the essential idea is to precompute some auxiliary information (prefix sums) that is used to answer ad hoc queries at run-time. By maintaining auxiliary information which is of the same size as the data cube, all range queries for a given cube can be answered in constant time, irrespective of the size of the sub-cube circumscribed by a query. Alternatively, one can keep auxiliary information which is 1/bd of the size of the d-dimensional data cube. Response to a range query may now require access to some cells of the data cube in addition to the access to the auxiliary information, but the overall time complexity is typically reduced significantly. We also discuss how the precomputed information is incrementally updated by batching updates to the data cube. Finally, we present algorithms for choosing the subset of the data cube dimensions for which the auxiliary information is computed and the blocking factor to use for each such subset. Our approach to answering range-max queries is based on precomputed max over balanced hierarchical tree structures. We use a branch-and-bound-like procedure to speed up the finding of max in a region. We also show that with a branch-and-bound procedure, the average-case complexity is much smaller than the worst-case complexity.

[1]  L. G. Mitten Branch-and-Bound Methods: General Formulation and Properties , 1970, Oper. Res..

[2]  A. Nijenhuis Combinatorial algorithms , 1975 .

[3]  Douglas Comer,et al.  Ubiquitous B-Tree , 1979, CSUR.

[4]  Jon Louis Bentley,et al.  Data Structures for Range Searching , 1979, CSUR.

[5]  Jon Louis Bentley,et al.  Multidimensional divide-and-conquer , 1980, CACM.

[6]  Kurt Mehlhorn,et al.  Data Structures and Algorithms 3: Multi-dimensional Searching and Computational Geometry , 2012, EATCS Monographs on Theoretical Computer Science.

[7]  George S. Lueker,et al.  Adding range restriction capability to dynamic data structures , 1985, JACM.

[8]  Pravin M. Vaidya Space-time tradeoffs for orthogonal range queries , 1985, STOC '85.

[9]  Andrew Chi-Chih Yao On the Complexity of Maintaining Partial Sums , 1985, SIAM J. Comput..

[10]  N. J. A. Sloane,et al.  Further results on the covering radius of codes , 1986, IEEE Trans. Inf. Theory.

[11]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[12]  Meng Chang Chen,et al.  On the Data Model and Access Method of Summary Data Management , 1989, IEEE Trans. Knowl. Data Eng..

[13]  Bernard Chazelle,et al.  Computing partial sums in multidimensional arrays , 1989, SCG '89.

[14]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[15]  Jaideep Srivastava,et al.  TBSAM: An Access Method for Efficient Processing of Statistical Queries , 1989, IEEE Trans. Knowl. Data Eng..

[16]  Hans-Peter Kriegel,et al.  The R*-tree: an efficient and robust access method for points and rectangles , 1990, SIGMOD '90.

[17]  Bernard Chazelle,et al.  Lower bounds for orthogonal range searching: part II. The arithmetic model , 1990, JACM.

[18]  Zbigniew Michalewicz Statistical and Scientific Databases , 1991 .

[19]  Isidore Rigoutsos,et al.  An algorithm for point clustering and grid generation , 1991, IEEE Trans. Syst. Man Cybern..

[20]  Kyuseok Shim,et al.  Including Group-By in Query Optimization , 1994, VLDB.

[21]  Ashish Gupta,et al.  Aggregate-Query Processing in Data Warehousing Environments , 1995, VLDB.

[22]  Per-Åke Larson,et al.  Eager Aggregation and Lazy Aggregation , 1995, VLDB.

[23]  Josef Bigün,et al.  Hierarchical image segmentation by multi-dimensional clustering and orientation-adaptive boundary refinement , 1995, Pattern Recognit..

[24]  Jeffrey F. Naughton,et al.  On the Computation of Multidimensional Aggregates , 1996, VLDB.

[25]  Rakesh Agrawal,et al.  SPRINT: A Scalable Parallel Classifier for Data Mining , 1996, VLDB.

[26]  Jeffrey F. Naughton,et al.  Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies , 1996, VLDB.

[27]  George Colliat,et al.  OLAP, relational, and multidimensional database systems , 1996, SGMD.

[28]  Venky Harinarayan,et al.  Implementing Data Cubes E ciently , 1996 .

[29]  D. Shasha,et al.  Hierarchically Split Cube Forests for Decision Support: description and tuned design , 1996 .

[30]  Jeffrey D. Ullman,et al.  Implementing data cubes efficiently , 1996, SIGMOD '96.

[31]  Jehoshua Bruck,et al.  Partial-sum queries in OLAP data cubes using covering codes , 1997, PODS '97.

[32]  Jeffrey D. Ullman,et al.  Index selection for OLAP , 1997, Proceedings 13th International Conference on Data Engineering.

[33]  Sunita Sarawagi,et al.  Modeling multidimensional databases , 1997, Proceedings 13th International Conference on Data Engineering.