Performance Evaluation of Spark SQL Using BigBench

In this paper we present the initial results of our work to execute BigBench on Spark. First, we evaluated the scalability behavior of the existing MapReduce implementation of BigBench. Next, we executed the group of 14 pure HiveQL queries on Spark SQL and compared the results with the respective Hive ones. Our experiments show that: (1) for both Hive and Spark SQL, BigBench queries perform with the increase of the data size on average better than the linear scaling behavior and (2) pure HiveQL queries perform faster on Spark SQL than on Hive.

[1]  Todor Ivanov,et al.  Evaluating Hive and Spark SQL with BigBench , 2015, ArXiv.

[2]  Yanpei Chen We Don't Know Enough to make a Big Data Benchmark Suite - An Academia-Industry View , 2012 .

[3]  Raghunath Othayoth Nambiar,et al.  TPC State of the Council 2013 , 2013, TPCTC.

[4]  Tilmann Rabl,et al.  A Data Generator for Cloud-Scale Benchmarking , 2010, TPCTC.

[5]  Tilmann Rabl,et al.  A BigBench Implementation in the Hadoop Ecosystem , 2013, WBDB.

[6]  Carlo Curino,et al.  Discussion of BigBench: A Proposed Industry Standard Performance Benchmark for Big Data , 2014, TPCTC.

[7]  Yanpei Chen,et al.  From TPC-C to Big Data Benchmarks: A Functional Workload Model , 2012, WBDB.

[8]  Joseph K. Bradley,et al.  Spark SQL: Relational Data Processing in Spark , 2015, SIGMOD Conference.

[9]  Chaitanya K. Baru,et al.  Setting the Direction for Big Data Benchmark Standards , 2012, TPCTC.

[10]  J. Manyika Big data: The next frontier for innovation, competition, and productivity , 2011 .

[11]  Tilmann Rabl,et al.  BigBench Specification V0.1 - BigBench: An Industry Standard Benchmark for Big Data Analytics , 2012, WBDB.

[12]  Raghunath Othayoth Nambiar,et al.  Introducing TPCx-HS: The First Industry Standard for Benchmarking Big Data Systems , 2014, TPCTC.

[13]  Michael J. Franklin,et al.  Resilient Distributed Datasets: A Fault-Tolerant Abstraction for In-Memory Cluster Computing , 2012, NSDI.

[14]  Michael J. Carey,et al.  BDMS Performance Evaluation: Practices, Pitfalls, and Possibilities , 2012, TPCTC.