A cluster architecture for parallel data warehousing

Describes the parallel, cluster-based implementation of an algorithm for the computation of a database operator known as the datacube. Though a number of efficient sequential algorithms have recently been proposed for this problem, very little research effort has been expended upon cost-effective parallelization techniques. Our approach builds directly upon the existing sequential proposals and is designed to be both load-balanced and communication-efficient. We also provide experimental results that demonstrate the viability of our technique under a variety of test conditions. Ultimately, we show that parallel performance relative to the underlying sequential algorithm (speedup) is near-optimal.