论文信息 - Benchmarking with TPC-H on Off-the-Shelf Hardware - An Experiments Report

Benchmarking with TPC-H on Off-the-Shelf Hardware - An Experiments Report

Most medium-sized enterprises run their databases on inexpensive off-the-shelf hardware; still, answers to quite complex queries, like ad-hoc Decision Support System (DSS) ones, are required within a reasonable time window. Therefore, it becomes increasingly important that the chosen database system and its tuning be optimal for the specific database size and design. Such optimization could occur in-house, based on tests with academic database benchmarks adapted to the small-scale, easy-to-use requirements of a medium-sized enterprise. This paper focuses on industry standard TPC-H database benchmark that aims at measuring the performance of ad-hoc DSS queries. Since the only available TPC-H results feature large databases and run on high-end hardware, we attempt to assess whether the standard test is meaningfully downscalable and can be performed on off-the-shelf hardware, common in medium-sized enterprises. We present in detail the benchmark and the steps that a non-expert must take to run a benchmark test following the TPC-H specifications. In addition, we report our own benchmark tests, comparing an open-source and a commercial database server running on off-the-shelf inexpensive hardware under a number of equivalent configurations, varying parameters that affect the performance of DSS queries.

Paulo Carreira | Helena Galhardas | Anna Thanopoulou

[1] Meenakshi A. Kandaswamy,et al. I/O phase characterization of TPC-H query operations , 2000, Proceedings IEEE International Computer Performance and Dependability Symposium. IPDS 2000.

[2] Said Elnaffar,et al. Automatically classifying database workloads , 2002, CIKM '02.

[3] Don DeSota. Characterization of I/O for TPC-C and TPC-H workloads , 2001 .

[4] Pedro Trancoso,et al. Building and Validating a Reduced TPC-H Benchmark , 2006, 14th IEEE International Symposium on Modeling, Analysis, and Simulation.

[5] Thomas F. Wenisch,et al. Disaggregated memory for expansion and sharing in blade servers , 2009, ISCA '09.

[6] Virginie Thion,et al. Speeding-up data-driven applications with program summaries , 2009, IDEAS '09.

[7] Thiago Luís Lopes Siqueira,et al. Benchmarking Spatial Data Warehouses , 2010, DaWak.

[8] Anastasia Ailamaki,et al. QPipe: a simultaneously pipelined relational query engine , 2005, SIGMOD '05.

[9] Paul DuBois,et al. MySQL Reference Manual , 2002 .

[10] Said Elnaffar,et al. Is it DSS or OLTP: automatically identifying DBMS workloads , 2008, Journal of Intelligent Information Systems.

[11] Fadila Bentayeb,et al. The Design of DWEB , 2005 .