A Survey on Benchmarks for Big Data and Some More Considerations

A big data benchmark suite is needed eagerly by customers, industry and academia recently. A number of prominent works in last several years are reviewed, their characteristics are introduced and shortcomings are analyzed. The authors also provide some suggestions on building the expected benchmark, including: component based benchmarks as well as end-to-end benchmarks should be used together to test distinct tools and test the system as a whole; workloads should be enriched with complex analytics to encompass different application scenarios; metrics other than performance metrics should also be considered.

[1]  Timothy G. Armstrong,et al.  LinkBench: a database benchmark based on the Facebook social graph , 2013, SIGMOD '13.

[2]  Jie Huang,et al.  The HiBench benchmark suite: Characterization of the MapReduce-based data analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[3]  Magdalena Balazinska,et al.  Analyzing massive astrophysical datasets: Can Pig/Hadoop or a relational DBMS help? , 2009, 2009 IEEE International Conference on Cluster Computing and Workshops.

[4]  S. Gurumurthi,et al.  A Benchmark Suite for Unstructured Data Processing , 2007, Fourth International Workshop on Storage Network Architecture and Parallel I/Os (SNAPI 2007).

[5]  Gang Lu,et al.  CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications , 2012, Frontiers of Computer Science.

[6]  Kiyoung Kim,et al.  MRBench: A Benchmark for MapReduce Framework , 2008, 2008 14th IEEE International Conference on Parallel and Distributed Systems.

[7]  Seyong Lee,et al.  PUMA: Purdue MapReduce Benchmarks Suite , 2012 .

[8]  Rim Moussa TPC-H Benchmark Analytics Scenarios and Performances on Hadoop Data Clouds , 2012, NDT.

[9]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[10]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[11]  Yanpei Chen,et al.  From TPC-C to Big Data Benchmarks: A Functional Workload Model , 2012, WBDB.