论文信息 - Techniques for Speeding up Range-Max Queries in OLAP Data Cubes

Techniques for Speeding up Range-Max Queries in OLAP Data Cubes

A range-max query obtains the maximum over all selected cells of a data cube where the selection is speci ed by providing ranges of values for numeric dimensions. Our general approach to speeding up range-max queries is to precompute and store certain key information of the data cube. In [HAMS97], we gave a tree algorithm based on precomputed max over balanced hierarchical tree structures; a branch-and-bound-[Mit70]like procedure was used to prune unnecessary search. In this paper, we propose three orthogonal techniques with the objective of improving the average response time of the range-max queries. First, rather than keeping only the index of the largest value at each internal node of the tree, we keep the indices of the t largest values with each internal node and use them to decrease the probability of scanning lower level nodes. Second, we further partition each sibling set of internal nodes into smaller groups and sort the precomputed indices within each group according to their indexed values. This speeds up the scanning of internal nodes at the same level and covered by the query region without increasing extra storage overhead. Third, we augment the tree with a precomputed reference array for each level of the tree (except for the leaf level). Elements of a reference array contain references to the next larger value, which are used to speed up the search. We compare our three algorithms with the previous algorithm both analytically and empirically. Based on these comparisons, we then propose and implement a hybrid algorithm, combining the advantages of these orthogonal techniques, that improves the empirically measured range-max query time by as much as 100%. We also give algorithms for incrementally updating the precomputed structures.

N. Megiddo | R. Agrawal | Ching-Tien Ho | Jyh-Jong Tsay