We Don't Know Enough to make a Big Data Benchmark Suite - An Academia-Industry View

Benchmarks facilitate performance comparison between equivalent systems. They inform procurement decisions, configuration tuning, features planning, deployment validation, and many other efforts in engineering, marketing, and customer support. Benchmarks are important when the underlying system enjoys sufficient maturity such that the priority moves beyond chaotic feature addition and debugging, and sufficient customers and vendors exist such that performance matters. Big data systems are entering this phase. Characteristics of big data systems present unique challenges for benchmarking efforts. These include (1) system complexity, which makes it difficult to develop mental models, (2) use case diversity, which complicates efforts to identify representative behavior, (3) data scale, which makes it challenging to reproduce behavior, and (4) rapid system evolution, which requires benchmarks keep pace with changes in the underlying systems. The position of this paper comes from an unprecedented empirical analysis of seven production workloads of MapReduce, an important class of big data systems. The main lesson we learned is that we do not know much about real life use cases of big data systems at all. Without real life empirical insights, both vendors and customers often have incorrect assumptions about their own workloads. Scientifically speaking, we are not quite ready to declare anything to be worthy of the label “big data benchmark.” Nonetheless, we should encourage further measurement, exploration, and development of stopgap tools.

[1]  Archana Ganapathi,et al.  The Case for Evaluating MapReduce Performance Using Workload Suites , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.

[2]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.