Data Generation for Application-Specific Benchmarking

The Transaction Processing Council (TPC) has played a pivotal role in the growth of the database industry over the last twenty-five years. However, its handful of domain-specific benchmarks are increasingly irrelevant to the multitude of data-centric applications, and its top-down process is too slow. This mismatch calls for a paradigm shift to a bottomup community effort to develop tools for application-specific benchmarking. Such a development program would center around techniques for synthetically scaling (up or down) an empirical dataset. This engineering effort in turn requires the development of a database theory on attribute value correlation.

[1]  Surajit Chaudhuri,et al.  Flexible Database Generators , 2005, VLDB.

[2]  Ben Y. Zhao,et al.  User interactions in social networks and their implications , 2009, EuroSys '09.

[3]  Carsten Binnig,et al.  How is the weather tomorrow?: towards a benchmark for the cloud , 2009, DBTest '09.

[4]  Christos Faloutsos,et al.  Graph evolution: Densification and shrinking diameters , 2006, TKDD.

[5]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[6]  Carsten Binnig,et al.  QAGen: generating query-aware test databases , 2007, ACM SIGMOD Conference.

[7]  Stanley B. Zdonik,et al.  CORADD: Correlation Aware Database Designer for Materialized Views and Indexes , 2010, Proc. VLDB Endow..

[8]  David J. DeWitt,et al.  The TEXTURE Benchmark: Measuring Performance of Text Queries on a Relational DBMS , 2005, VLDB.

[9]  Tim Kraska,et al.  An evaluation of alternative architectures for transaction processing in the cloud , 2010, SIGMOD Conference.

[10]  Margo I. Seltzer,et al.  The case for application-specific benchmarking , 1999, Proceedings of the Seventh Workshop on Hot Topics in Operating Systems.

[11]  Craig W. Thompson,et al.  A parallel general-purpose synthetic data generator , 2007, SGMD.

[12]  Umeshwar Dayal,et al.  Data desensitization of customer data for use in optimizer performance experiments , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[13]  Michael Stonebraker,et al.  A New Direction for TPC? , 2009, TPCTC.

[14]  Carsten Binnig,et al.  Reverse Query Processing , 2007, 2007 IEEE 23rd International Conference on Data Engineering.

[15]  Meikel Pöss,et al.  MUDD: a multi-dimensional data generator , 2004, WOSP '04.

[16]  Paul Brown,et al.  CORDS: automatic discovery of correlations and soft functional dependencies , 2004, SIGMOD '04.

[17]  David J. DeWitt,et al.  The Wisconsin Benchmark: Past, Present, and Future , 1991, The Benchmark Handbook.