Benchmarking cloud serving systems with YCSB

While the use of MapReduce systems (such as Hadoop) for large scale data analysis has been widely recognized and studied, we have recently seen an explosion in the number of systems developed for cloud data serving. These newer systems address "cloud OLTP" applications, though they typically do not support ACID transactions. Examples of systems proposed for cloud serving use include BigTable, PNUTS, Cassandra, HBase, Azure, CouchDB, SimpleDB, Voldemort, and many others. Further, they are being applied to a diverse range of applications that differ considerably from traditional (e.g., TPC-C like) serving workloads. The number of emerging cloud serving systems and the wide range of proposed applications, coupled with a lack of apples-to-apples performance comparisons, makes it difficult to understand the tradeoffs between systems and the workloads for which they are suited. We present the "Yahoo! Cloud Serving Benchmark" (YCSB) framework, with the goal of facilitating performance comparisons of the new generation of cloud data serving systems. We define a core set of benchmarks and report results for four widely used systems: Cassandra, HBase, Yahoo!'s PNUTS, and a simple sharded MySQL implementation. We also hope to foster the development of additional cloud benchmark suites that represent other classes of applications by making our benchmark tool available via open source. In this regard, a key feature of the YCSB framework/tool is that it is extensible--it supports easy definition of new workloads, in addition to making it easy to benchmark new systems.

[1]  David J. DeWitt,et al.  The Wisconsin Benchmark: Past, Present, and Future , 1991, The Benchmark Handbook.

[2]  Jim Gray,et al.  Benchmark Handbook: For Database and Transaction Processing Systems , 1992 .

[3]  Kenneth Baclawski,et al.  Quickly generating billion-record synthetic databases , 1994, SIGMOD '94.

[4]  Margo I. Seltzer,et al.  The case for application-specific benchmarking , 1999, Proceedings of the Seventh Workshop on Hot Topics in Operating Systems.

[5]  Mike Hibler,et al.  An integrated experimental environment for distributed systems and networks , 2002, OPSR.

[6]  Ioana Manolescu,et al.  XMark: A Benchmark for XML Data Management , 2002, VLDB.

[7]  Dejan Kostic,et al.  Scalability and accuracy in a large-scale network emulator , 2002, CCRV.

[8]  K. Walsh,et al.  Scalability and accuracy in a large-scale network emulator , 2002, OPSR.

[9]  Nancy A. Lynch,et al.  Brewer's conjecture and the feasibility of consistent, available, partition-tolerant web services , 2002, SIGA.

[10]  Michael Stonebraker,et al.  Linear Road: A Stream Data Management Benchmark , 2004, VLDB.

[11]  Michael Stonebraker,et al.  C-Store: A Column-oriented DBMS , 2005, VLDB.

[12]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[13]  Eric A. Brewer,et al.  Rose: compressed, log-structured replication , 2008, Proc. VLDB Endow..

[14]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[15]  Wilson C. Hsieh,et al.  Bigtable: A Distributed Storage System for Structured Data , 2006, TOCS.

[16]  Jeffrey S. Chase,et al.  Cutting Corners: Workbench Automation for Server Benchmarking , 2008, USENIX Annual Technical Conference.

[17]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[18]  Srinivasan Parthasarathy,et al.  Letter from the Special Issue Editor , 2009, IEEE Data Eng. Bull..

[19]  Eugene Ciurana,et al.  Google App Engine , 2009 .

[20]  Prashant Malik,et al.  Cassandra: structured storage system on a P2P network , 2009, PODC '09.

[21]  Martin Dietzfelbinger,et al.  Hash, Displace, and Compress , 2009, ESA.