Range Top/Bottom k Queries in OLAP Sparse Data Cubes

A range top k query finds the top k maximum values over all selected cells of an OLAP data cube where the selection is specified by the range of contiguous values for each dimension. In this paper, we propose a partition-based storage structure, which is capable of answering both range top and bottom k queries in OLAP sparse data cubes. This is achieved by partitioning a multi-dimensional sparse data cube and storing it in partition-major order into two one-dimensional arrays: one is for the dense partitions and the other one is for the sparse partitions. This algorithm supports both single cell update and bulk batch update. Nevertheless, the update cost for a set of cells in a partition is similar to the update cost of a single cell update, i.e. extra 2 I/Os in the most cases and the worst is extra 5 I/Os in some very rare cases. Our approach also reduces the storage cost of sparse data cubes.