A Parallel Algorithm for Closed Cube Computation

Closed cubing is a very efficient algorithm for data cube compression proposed recently in the literature. It losslessly condenses a group of cells into one cell if these cells have the same aggregate value and preserve roll-up/drill-down semantics. Despite its importance, parallel closed cubing solutions for huge data sets are not well studied so far to the best of the authors' knowledge. This paper presents a parallel closed cube construction and query algorithm over low cost PC clusters using the MapReduce framework. In addition, we proved that with the number of data blocks increases, the closed cubes' storage size decreases gradually. Thus users can specify the number of data blocks to balance the performance between cubes storage and query time. Experimental study demonstrates that our algorithm is efficient and scalable.