Experience from Hadoop Benchmarking with HiBench: From Micro-Benchmarks Toward End-to-End Pipelines

As Hadoop-based big data framework grows in pervasiveness and scale, realistically benchmarking Hadoop systems becomes critically important to the Hadoop community and industry. In this paper, we present our experience of Hadoop benchmarking with HiBench (an open source Hadoop benchmark suite widely used by Hadoop users), and introduce our recent work on advanced end-to-end ETL-recommendation pipelines based on our experience.

[1]  Archana Ganapathi,et al.  The Case for Evaluating MapReduce Performance Using Workload Suites , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.

[2]  Jie Huang,et al.  The HiBench benchmark suite: Characterization of the MapReduce-based data analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[3]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.