Testing of Big Data Analytics Systems by Benchmark

With the rapid development of big data technologies and applications, various big data analytics systems have been released by open source communities and industry. So testing and evaluating the overall performance of these big data analytics systems has become an important research topic. The paper analyzes in detail the challenges of testing big data analytics systems and proposes the method and strategies for the testing. Furthermore, the paper presents two cases of testing big data analytics systems by benchmark.

[1]  Raghunath Othayoth Nambiar,et al.  Why You Should Run TPC-DS: A Workload Analysis , 2007, VLDB.

[2]  Joseph K. Bradley,et al.  Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.

[3]  Tilmann Rabl,et al.  A Data Generator for Cloud-Scale Benchmarking , 2010, TPCTC.

[4]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[5]  Tilmann Rabl,et al.  The Vision of BigBench 2.0 , 2015, DanaC@SIGMOD.

[6]  Setsuo Ohsuga,et al.  INTERNATIONAL CONFERENCE ON VERY LARGE DATA BASES , 1977 .

[7]  Jie Huang,et al.  The HiBench benchmark suite: Characterization of the MapReduce-based data analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[8]  Zheng Shao,et al.  Hive - a petabyte scale data warehouse using Hadoop , 2010, 2010 IEEE 26th International Conference on Data Engineering (ICDE 2010).

[9]  Raghunath Othayoth Nambiar,et al.  The making of TPC-DS , 2006, VLDB.