Adaptive Virtual Partitioning for OLAP Query Processing in a Database Cluster

OLAP queries are typically heavy-weight and ad-hoc thus requiring high storage capacity and processing power. In this paper, we address this problem using a database cluster which we see as a cost-effective alternative to a tightly-coupled multiprocessor. We propose a solution to efficient OLAP query processing using a simple data parallel processing technique called adaptive virtual partitioning which dynamically tunes partition sizes, without requiring any knowledge about the database and the DBMS. To validate our solution, we implemented a Java prototype on a 32 node cluster system and ran experiments with typical queries of the TPC-H benchmark. The results show that our solution yields linear, and sometimes super-linear, speedup. In many cases, it outperforms traditional virtual partitioning by factors superior to 10.