Towards a Complete BigBench Implementation

BigBench was the first proposal for an end-to-end big data analytics benchmark. It features a set of 30 realistic queries based on real big data use cases. It was fully specified and completely implemented on the Hadoop stack. In this paper, we present updates on our development of a complete implementation on the Hadoop ecosystem. We will focus on the changes that we have made to data set, scaling, refresh process, and metric.

[1]  Rim Moussa TPC-H Benchmark Analytics Scenarios and Performances on Hadoop Data Clouds , 2012, NDT.

[2]  Raghunath Othayoth Nambiar,et al.  Why You Should Run TPC-DS: A Workload Analysis , 2007, VLDB.

[3]  Jinquan Dai,et al.  Experience from Hadoop Benchmarking with HiBench: From Micro-Benchmarks Toward End-to-End Pipelines , 2013, WBDB.

[4]  Yuqing Zhu,et al.  BigDataBench: A big data benchmark suite from internet services , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[5]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[6]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[7]  Xian Liu,et al.  Big Data Benchmark - Big DS , 2013, WBDB.

[8]  Tilmann Rabl,et al.  BigBench Specification V0.1 - BigBench: An Industry Standard Benchmark for Big Data Analytics , 2012, WBDB.

[9]  Tilmann Rabl,et al.  A Data Generator for Cloud-Scale Benchmarking , 2010, TPCTC.

[10]  Kiyoung Kim,et al.  MRBench: A Benchmark for MapReduce Framework , 2008, 2008 14th IEEE International Conference on Parallel and Distributed Systems.

[11]  Jie Huang,et al.  The HiBench benchmark suite: Characterization of the MapReduce-based data analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[12]  Tilmann Rabl,et al.  A BigBench Implementation in the Hadoop Ecosystem , 2013, WBDB.