MyBenchmark: generating databases for query workloads

To evaluate the performance of database applications and database management systems (DBMSs), we usually execute workloads of queries on generated databases of different sizes and then benchmark various measures such as respond time and throughput. This paper introduces MyBenchmark, a parallel data generation tool that takes a set of queries as input and generates database instances. Users of MyBenchmark can control the characteristics of the generated data as well as the characteristics of the resulting workload. Applications of MyBenchmark include DBMS testing, database application testing, and application-driven benchmarking. In this paper, we present the architecture and the implementation algorithms of MyBenchmark. Experimental results show that MyBenchmark is able to generate workload-aware databases for a variety of workloads including query workloads extracted from TPC-C, TPC-E, TPC-H, and TPC-W benchmarks.

[1]  Meikel Pöss,et al.  MUDD: a multi-dimensional data generator , 2004, WOSP '04.

[2]  Christoph Kaiser,et al.  No one-size-fits-all: A tailored approach to antithrombotic therapy after stent implantation. , 2012, Circulation.

[3]  Kenneth Baclawski,et al.  Quickly generating billion-record synthetic databases , 1994, SIGMOD '94.

[4]  Elaine J. Weyuker,et al.  An AGENDA for testing relational database applications , 2004, Softw. Test. Verification Reliab..

[5]  David J. DeWitt,et al.  Database support for matching: limitations and opportunities , 2006, SIGMOD Conference.

[6]  Donald Kossmann,et al.  A framework for efficient regression tests on database applications , 2007, The VLDB Journal.

[7]  G FranklPhyllis,et al.  An AGENDA for testing relational database applications , 2004 .

[8]  Guido Moerkotte,et al.  On the Complexity of Generating Optimal Left-Deep Processing Trees with Cross Products , 1995, ICDT.

[9]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[10]  Tilmann Rabl,et al.  Parallel data generation for performance analysis of large, complex RDBMS , 2011, DBTest '11.

[11]  Carsten Binnig,et al.  QAGen: generating query-aware test databases , 2007, ACM SIGMOD Conference.

[12]  David K. Smith Network Flows: Theory, Algorithms, and Applications , 1994 .

[13]  Jian Li,et al.  Data generation using declarative constraints , 2011, SIGMOD '11.

[14]  Y. C. Tay,et al.  Data Generation for Application-Specific Benchmarking , 2011 .

[15]  Surajit Chaudhuri,et al.  Generating Queries with Cardinality Constraints for DBMS Testing , 2006, IEEE Transactions on Knowledge and Data Engineering.

[16]  Rico Wind,et al.  Simple and realistic data generation , 2006, VLDB.

[17]  Surajit Chaudhuri,et al.  Flexible Database Generators , 2005, VLDB.

[18]  Sachchidanand Singh,et al.  Big Data analytics , 2012 .

[19]  Philip S. Yu,et al.  Optimization of Parallel Execution for Multi-Join Queries , 1996, IEEE Trans. Knowl. Data Eng..

[20]  W ThompsonCraig,et al.  A parallel general-purpose synthetic data generator , 2007 .

[21]  Phyllis G. Frankl,et al.  A framework for testing database applications , 2000, ISSTA '00.

[22]  Wing-Kai Hon,et al.  Generating databases for query workloads , 2010, Proc. VLDB Endow..

[23]  Andrew V. Goldberg,et al.  On Implementing the Push—Relabel Method for the Maximum Flow Problem , 1997, Algorithmica.

[24]  Nick Koudas,et al.  Generating targeted queries for database testing , 2008, SIGMOD Conference.

[25]  Craig W. Thompson,et al.  A parallel general-purpose synthetic data generator , 2007, SGMD.

[26]  Heikki Mannila,et al.  Test data for relational queries , 1985, PODS '86.

[27]  Carsten Binnig,et al.  A framework for testing DBMS features , 2010, The VLDB Journal.

[28]  Timos K. Sellis,et al.  Multiple-query optimization , 1988, TODS.

[29]  Christopher Olston,et al.  Generating example data for dataflow programs , 2009, SIGMOD Conference.

[30]  Carsten Binnig,et al.  Reverse Query Processing , 2007, 2007 IEEE 23rd International Conference on Data Engineering.