OLAP Query Evaluation in a Database Cluster: A Performance Study on Intra-Query Parallelism

While cluster computing is well established, it is not clear how to coordinate clusters consisting of many database components in order to process high workloads. In this paper, we focus on Online Analytical Processing (OLAP) queries, i.e., relatively complex queries whose evaluation tends to be time-consuming, and we report on some observations and preliminary results of our PowerDB project in this context. We investigate how many cluster nodes should be used to evaluate an OLAP query in parallel. Moreover, we provide a classification of OLAP queries, which is used to decide, whether and how a query should be parallelized. We run extensive experiments to evaluate these query classes in quantitative terms. Our results are an important step towards a two-phase query optimizer. In the first phase, the coordination infrastructure decomposes a query into subqueries and ships them to appropriate cluster nodes. In the second phase, each cluster node optimizes and evaluates its subquery locally.

[1]  Patrick Valduriez,et al.  Distributed and parallel database systems , 1996, CSUR.

[2]  Michael Stonebraker,et al.  The Design of XPRS , 1988, VLDB.

[3]  Klemens Böhm,et al.  OLAP Query Routing and Physical Design in a Database Cluster , 2000, EDBT.

[4]  Lionel Brunie,et al.  A PC-NOW Based Parallel Extension for a Sequential DBMS , 2000, IPDPS Workshops.

[5]  Donald Kossmann,et al.  The state of the art in distributed query processing , 2000, CSUR.

[6]  Kalen Delaney Inside Microsoft SQL Server 2000 , 2000 .

[7]  Donovan A. Schneider,et al.  The Gamma Database Machine Project , 1990, IEEE Trans. Knowl. Data Eng..

[8]  Bernhard Mitschang,et al.  On Transforming a Sequential SQL-DBMS into a Parallel One: First Results and Experiences of the MIDAS Project , 1996, Euro-Par, Vol. II.

[9]  Chaitanya K. Baru,et al.  DB2 Parallel Edition , 1995, IBM Syst. J..

[10]  Heiko Schuldt,et al.  FAS - A Freshness-Sensitive Coordination Middleware for a Cluster of OLAP Components , 2002, VLDB.

[11]  Goetz Graefe,et al.  Volcano - An Extensible and Parallel Query Evaluation System , 1994, IEEE Trans. Knowl. Data Eng..

[12]  Harry K. T. Wong,et al.  Optimization of nested SQL queries revisited , 1987, SIGMOD '87.

[13]  David J. DeWitt,et al.  Parallel database systems: the future of high performance database systems , 1992, CACM.

[14]  Masato Oguchi,et al.  Parallel Database Processing on a 100 Node PC Cluster: Cases for Decision Support Query Processing and Data Mining , 1997, ACM/IEEE SC 1997 Conference (SC'97).

[15]  Hans-Jörg Schek,et al.  Cache-aware query routing in a cluster of databases , 2001, Proceedings 17th International Conference on Data Engineering.

[16]  Patrick Valduriez,et al.  Prototyping Bubba, A Highly Parallel Database System , 1990, IEEE Trans. Knowl. Data Eng..

[17]  Jeffrey F. Naughton,et al.  Adaptive parallel aggregation algorithms , 1995, SIGMOD '95.