From BigBench to TPCx-BB: Standardization of a Big Data Benchmark

With the increased adoption of Hadoop-based big data systems for the analysis of large volume and variety of data, an effective and common benchmark for big data deployments is needed. There have been a number of proposals from industry and academia to address this challenge. While most either have basic workloads (e.g. word counting), or port existing benchmarks to big data systems (e.g. TPC-H or TPC-DS), some are specifically designed for big data challenges. The most comprehensive proposal among these is the BigBench benchmark, recently standardized by the Transaction Processing Performance Council as TPCx-BB. In this paper, we discuss the progress made since the original BigBench proposal to the standardized TPCx-BB. In addition, we will share the thought process went into creating the specification, challenges in navigating the uncharted territories of a complex benchmark for a fast moving technology domain, and analyze the functionality of the benchmark suite on different Hadoop- and non-Hadoop-based big data engines. We will provide insights on the first official result of TPCx-BB and finally discuss, in brief, other relevant and fast growing big data analytic use cases to be addressed in future big data benchmarks.

[1]  Shirish Tatikonda,et al.  Compiling machine learning algorithms with SystemML , 2013, SoCC.

[2]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[3]  Raghunath Othayoth Nambiar,et al.  TPC State of the Council 2013 , 2013, TPCTC.

[4]  Scott Shenker,et al.  Making Sense of Performance in Data Analytics Frameworks , 2015, NSDI.

[5]  Raghunath Othayoth Nambiar,et al.  Why You Should Run TPC-DS: A Workload Analysis , 2007, VLDB.

[6]  Felix Naumann,et al.  The Stratosphere platform for big data analytics , 2014, The VLDB Journal.

[7]  Tilmann Rabl,et al.  A Data Generator for Cloud-Scale Benchmarking , 2010, TPCTC.

[8]  Avery Ching,et al.  One Trillion Edges: Graph Processing at Facebook-Scale , 2015, Proc. VLDB Endow..

[9]  Tilmann Rabl,et al.  Towards a Complete BigBench Implementation , 2014, WBDB.

[10]  Daniel E. O'Leary,et al.  'Big Data', the 'Internet of Things' and the 'Internet of Signs' , 2013, Intell. Syst. Account. Finance Manag..

[11]  Archana Ganapathi,et al.  The Case for Evaluating MapReduce Performance Using Workload Suites , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.

[12]  Aart J. C. Bik,et al.  Pregel: a system for large-scale graph processing , 2010, SIGMOD Conference.

[13]  Li Zhang,et al.  SparkBench: a comprehensive benchmarking suite for in memory data analytic platform Spark , 2015, Conf. Computing Frontiers.

[14]  Raghunath Othayoth Nambiar,et al.  Introducing TPCx-HS: The First Industry Standard for Benchmarking Big Data Systems , 2014, TPCTC.

[15]  Nathan Marz,et al.  Big Data: Principles and best practices of scalable realtime data systems , 2015 .

[16]  Chaitanya K. Baru,et al.  Setting the Direction for Big Data Benchmark Standards , 2012, TPCTC.