Generating databases for query workloads

To evaluate the performance of database applications and DBMSs, we usually execute workloads of queries on generated databases of different sizes and measure the response time. This paper introduces MyBenchmark, an offline data generation tool that takes a set of queries as input and generates database instances for which the users can control the characteristics of the resulting workload. Applications of MyBenchmark include database testing, database application testing, and application-driven benchmarking. We present the architecture and the implementation algorithms of MyBenchmark. We also present the evaluation results of MyBenchmark using TPC workloads.

[1]  Daniel Kroening,et al.  Cogent: Accurate Theorem Proving for Program Verification , 2005, CAV.

[2]  Ravindra K. Ahuja,et al.  Network Flows: Theory, Algorithms, and Applications , 1993 .

[3]  M. Birkner,et al.  Blow-up of semilinear PDE's at the critical dimension. A probabilistic approach , 2002 .

[4]  Carsten Binnig,et al.  Reverse Query Processing , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[5]  Heikki Mannila,et al.  Test data for relational queries , 1985, PODS '86.

[6]  Guido Moerkotte,et al.  On the Complexity of Generating Optimal Left-Deep Processing Trees with Cross Products , 1995, ICDT.

[7]  Surajit Chaudhuri,et al.  Generating Queries with Cardinality Constraints for DBMS Testing , 2006, IEEE Transactions on Knowledge and Data Engineering.

[8]  J. Kruskal On the shortest spanning subtree of a graph and the traveling salesman problem , 1956 .

[9]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[10]  Carsten Binnig,et al.  QAGen: generating query-aware test databases , 2007, SIGMOD '07.

[11]  Carsten Binnig,et al.  A framework for testing DBMS features , 2010, The VLDB Journal.

[12]  Kenneth Baclawski,et al.  Quickly generating billion-record synthetic databases , 1994, SIGMOD '94.

[13]  David J. DeWitt,et al.  Database support for matching: limitations and opportunities , 2006, SIGMOD Conference.

[14]  Andrew V. Goldberg,et al.  On Implementing the Push—Relabel Method for the Maximum Flow Problem , 1997, Algorithmica.

[15]  Christopher Olston,et al.  Generating example data for dataflow programs , 2009, SIGMOD Conference.

[16]  Donald Kossmann,et al.  A framework for efficient regression tests on database applications , 2007, The VLDB Journal.

[17]  Nick Koudas,et al.  Generating targeted queries for database testing , 2008, SIGMOD Conference.