Benchmarking with TPC-H on Off-the-Shelf Hardware - An Experiments Report

Most medium-sized enterprises run their databases on inexpensive off-the-shelf hardware; still, answers to quite complex queries, like ad-hoc Decision Support System (DSS) ones, are required within a reasonable time window. Therefore, it becomes increasingly important that the chosen database system and its tuning be optimal for the specific database size and design. Such optimization could occur in-house, based on tests with academic database benchmarks adapted to the small-scale, easy-to-use requirements of a medium-sized enterprise. This paper focuses on industry standard TPC-H database benchmark that aims at measuring the performance of ad-hoc DSS queries. Since the only available TPC-H results feature large databases and run on high-end hardware, we attempt to assess whether the standard test is meaningfully downscalable and can be performed on off-the-shelf hardware, common in medium-sized enterprises. We present in detail the benchmark and the steps that a non-expert must take to run a benchmark test following the TPC-H specifications. In addition, we report our own benchmark tests, comparing an open-source and a commercial database server running on off-the-shelf inexpensive hardware under a number of equivalent configurations, varying parameters that affect the performance of DSS queries.

[1]  Meenakshi A. Kandaswamy,et al.  I/O phase characterization of TPC-H query operations , 2000, Proceedings IEEE International Computer Performance and Dependability Symposium. IPDS 2000.

[2]  Said Elnaffar,et al.  Automatically classifying database workloads , 2002, CIKM '02.

[3]  Don DeSota Characterization of I/O for TPC-C and TPC-H workloads , 2001 .

[4]  Pedro Trancoso,et al.  Building and Validating a Reduced TPC-H Benchmark , 2006, 14th IEEE International Symposium on Modeling, Analysis, and Simulation.

[5]  Thomas F. Wenisch,et al.  Disaggregated memory for expansion and sharing in blade servers , 2009, ISCA '09.

[6]  Virginie Thion,et al.  Speeding-up data-driven applications with program summaries , 2009, IDEAS '09.

[7]  Thiago Luís Lopes Siqueira,et al.  Benchmarking Spatial Data Warehouses , 2010, DaWak.

[8]  Anastasia Ailamaki,et al.  QPipe: a simultaneously pipelined relational query engine , 2005, SIGMOD '05.

[9]  Paul DuBois,et al.  MySQL Reference Manual , 2002 .

[10]  Said Elnaffar,et al.  Is it DSS or OLTP: automatically identifying DBMS workloads , 2008, Journal of Intelligent Information Systems.

[11]  Fadila Bentayeb,et al.  The Design of DWEB , 2005 .

[12]  Philippe Bonnet,et al.  Database tuning principles, experiments, and troubleshooting techniques , 2004, SGMD.

[13]  Jia‐Lang Seng,et al.  A study on industry and synthetic standard benchmarks in relational and object databases , 2003, Ind. Manag. Data Syst..

[14]  Masaru Kitsuregawa,et al.  Storage fusion , 2008, ICUIMC '08.

[15]  Said Elnaffar,et al.  A methodology for auto-recognizing DBMS workloads , 2002, CASCON.

[16]  Allan N. Packer Configuring and Tuning Databases on the Solaris Platform , 2001 .

[17]  Meikel Pöss,et al.  TPC-DS, taking decision support benchmarking to the next level , 2002, SIGMOD '02.

[18]  Thomas F. Wenisch,et al.  Spatio-temporal memory streaming , 2009, ISCA '09.

[19]  Surajit Chaudhuri,et al.  Table of Contents (pdf) , 2007, VLDB.

[20]  Michael Reichert,et al.  Autonomic tuning expert: a framework for best-practice oriented autonomic database tuning , 2008, CASCON '08.