Techniques for Speeding up Range-Max Queries in OLAP Data Cubes

A range-max query obtains the maximum over all selected cells of a data cube where the selection is speci ed by providing ranges of values for numeric dimensions. Our general approach to speeding up range-max queries is to precompute and store certain key information of the data cube. In [HAMS97], we gave a tree algorithm based on precomputed max over balanced hierarchical tree structures; a branch-and-bound-[Mit70]like procedure was used to prune unnecessary search. In this paper, we propose three orthogonal techniques with the objective of improving the average response time of the range-max queries. First, rather than keeping only the index of the largest value at each internal node of the tree, we keep the indices of the t largest values with each internal node and use them to decrease the probability of scanning lower level nodes. Second, we further partition each sibling set of internal nodes into smaller groups and sort the precomputed indices within each group according to their indexed values. This speeds up the scanning of internal nodes at the same level and covered by the query region without increasing extra storage overhead. Third, we augment the tree with a precomputed reference array for each level of the tree (except for the leaf level). Elements of a reference array contain references to the next larger value, which are used to speed up the search. We compare our three algorithms with the previous algorithm both analytically and empirically. Based on these comparisons, we then propose and implement a hybrid algorithm, combining the advantages of these orthogonal techniques, that improves the empirically measured range-max query time by as much as 100%. We also give algorithms for incrementally updating the precomputed structures.

[1]  L. G. Mitten Branch-and-Bound Methods: General Formulation and Properties , 1970, Oper. Res..

[2]  Jon Louis Bentley,et al.  Data Structures for Range Searching , 1979, CSUR.

[3]  Jon Louis Bentley,et al.  Multidimensional divide-and-conquer , 1980, CACM.

[4]  George S. Lueker,et al.  Adding range restriction capability to dynamic data structures , 1985, JACM.

[5]  Pravin M. Vaidya Space-time tradeoffs for orthogonal range queries , 1985, STOC '85.

[6]  Andrew Chi-Chih Yao On the Complexity of Maintaining Partial Sums , 1985, SIAM J. Comput..

[7]  Meng Chang Chen,et al.  On the Data Model and Access Method of Summary Data Management , 1989, IEEE Trans. Knowl. Data Eng..

[8]  Bernard Chazelle,et al.  Computing partial sums in multidimensional arrays , 1989, SCG '89.

[9]  Hanan Samet,et al.  The Design and Analysis of Spatial Data Structures , 1989 .

[10]  Jaideep Srivastava,et al.  TBSAM: An Access Method for Efficient Processing of Statistical Queries , 1989, IEEE Trans. Knowl. Data Eng..

[11]  Bernard Chazelle,et al.  Lower bounds for orthogonal range searching: part II. The arithmetic model , 1990, JACM.

[12]  Kyuseok Shim,et al.  Including Group-By in Query Optimization , 1994, VLDB.

[13]  Ashish Gupta,et al.  Aggregate-Query Processing in Data Warehousing Environments , 1995, VLDB.

[14]  Per-Åke Larson,et al.  Eager Aggregation and Lazy Aggregation , 1995, VLDB.

[15]  Jeffrey F. Naughton,et al.  On the Computation of Multidimensional Aggregates , 1996, VLDB.

[16]  Jeffrey F. Naughton,et al.  Storage Estimation for Multidimensional Aggregates in the Presence of Hierarchies , 1996, VLDB.

[17]  George Colliat,et al.  OLAP, relational, and multidimensional database systems , 1996, SGMD.

[18]  Venky Harinarayan,et al.  Implementing Data Cubes E ciently , 1996 .

[19]  D. Shasha,et al.  Hierarchically Split Cube Forests for Decision Support: description and tuned design , 1996 .

[20]  Nimrod Megiddo,et al.  Range queries in OLAP data cubes , 1997, SIGMOD '97.

[21]  Jeffrey D. Ullman,et al.  Index selection for OLAP , 1997, Proceedings 13th International Conference on Data Engineering.

[22]  Sunita Sarawagi,et al.  Modeling multidimensional databases , 1997, Proceedings 13th International Conference on Data Engineering.