We introduce JCC-H, a drop-in replacement for the data and query generator of TPC-H, that introduces Join-Crossing-Correlations (JCC) and skew into its dataset and query workload. These correlations are carefully designed such that the filter predicates on table columns in the existing TPC-H queries now suddenly can have effects on the value-, frequency- and join-fan-out-distributions, experienced by operators in the query plan. The query generator of JCC-H is able to generate parameter bindings for the 22 query templates in two different equivalence classes: query templates that receive “normal” parameters do not experience skew and behave very similar to default TPC-H queries. Query templates expanded with the “skewed” parameters, though, experience strong join-crossing-correlations and skew in filter, aggregation and join operations. In this paper we discuss the goals of JCC-H, its detailed design, as well as show initial experiments on both a single-server and MPP database system, that confirm that our design goals were largely met. In all, JCC-H provides a convenient way for any system that is already testing with TPC-H to examine how the system can handle skew and correlations, so we hope the community can use it to make progress on issues like skew mitigation and detection and exploitation of join-crossing-correlations in query optimizers and data storage.
[1]
Raghunath Othayoth Nambiar,et al.
Why You Should Run TPC-DS: A Workload Analysis
,
2007,
VLDB.
[2]
Viktor Leis,et al.
How Good Are Query Optimizers, Really?
,
2015,
Proc. VLDB Endow..
[3]
Tilmann Rabl,et al.
Efficient update data generation for DBMS benchmarks
,
2012,
ICPE '12.
[4]
Thomas Neumann,et al.
TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark
,
2013,
TPCTC.
[5]
Xuedong Chen,et al.
The Star Schema Benchmark and Augmented Fact Table Indexing
,
2009,
TPCTC.
[6]
Andrey Gubichev,et al.
Parameter Curation for Benchmark Queries
,
2014,
TPCTC.
[7]
Hassan Chafi,et al.
The LDBC Social Network Benchmark: Interactive Workload
,
2015,
SIGMOD Conference.
[8]
Ahmad Ghazal,et al.
Introducing Skew into the TPC-H Benchmark
,
2011,
TPCTC.