Scalability Evaluation of Big Data Processing Services in Clouds

Currently, many cloud providers deploy their big data processing systems as cloud services, which helps users conveniently manage and process their data in clouds. Among different service providers’ big data processing services, how to evaluate and compare their scalability is an interesting and challenging work. Most traditional benchmark tools focus on performance evaluation of big data processing systems, such as aggregated throughput and IOPS, but fail to conduct a quantitative analysis of their scalability. In this paper, we propose a measurement methodology to quantify the scalability of big data processing services, which makes the cloud services scalability comparable. We conduct a group of comparative experiments on AliCloud E-MapReduce and Baidu MRS, and collect their respective scalability characteristics under Hadoop and Spark workloads. The scalability characteristics observed in our work could help cloud users choose the best cloud service platform to set up an optimized big data processing system to achieve their specific goals more successfully.

[1]  Lars George,et al.  HBase: The Definitive Guide , 2011 .

[2]  Yumei Wang,et al.  Energy Aware Virtual Machine Scheduling in Data Centers , 2019, Energies.

[3]  Tilmann Rabl,et al.  Benchmarking Big Data Systems and the BigData Top100 List , 2013, Big Data.

[4]  Guangjie Han,et al.  Characteristics of Co-Allocated Online Services and Batch Jobs in Internet Data Centers: A Case Study From Alibaba Cloud , 2019, IEEE Access.

[5]  Gang Lu,et al.  CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications , 2012, Frontiers of Computer Science.

[6]  Chunjie Luo,et al.  Characterizing data analysis workloads in data centers , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[7]  Jianfeng Zhan,et al.  Understanding Big Data Analytics Workloads on Modern Processors , 2015, IEEE Transactions on Parallel and Distributed Systems.

[8]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[9]  Jing Zhao,et al.  Benchmarking cloud-based data management systems , 2010, CloudDB '10.

[10]  Chunjie Luo,et al.  BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking , 2013, WBDB.

[11]  Hans-Arno Jacobsen,et al.  PNUTS: Yahoo!'s hosted data serving platform , 2008, Proc. VLDB Endow..

[12]  Wei-Tek Tsai,et al.  SaaS performance and scalability evaluation in clouds , 2011, Proceedings of 2011 IEEE 6th International Symposium on Service Oriented System (SOSE).

[13]  Neil J. Gunther,et al.  Hadoop Superlinear Scalability , 2015, ACM Queue.

[14]  Guido Maier,et al.  Assessing the Scalability of Next-Generation Wavelength Switched Optical Networks , 2014, Journal of Lightwave Technology.

[15]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[16]  Weisong Shi,et al.  Energy efficiency comparison of hypervisors , 2019, Sustain. Comput. Informatics Syst..

[17]  Santiago Badia,et al.  Implementation and Scalability Analysis of Balancing Domain Decomposition Methods , 2013 .

[18]  Zhen Jia,et al.  The Implications of Diverse Applications and Scalable Data Sets in Benchmarking Big Data Systems , 2012, WBDB.

[19]  Jie Huang,et al.  The HiBench benchmark suite: Characterization of the MapReduce-based data analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[20]  Lavanya Ramakrishnan,et al.  Benchmarking MapReduce implementations under different application scenarios , 2014, Future Gener. Comput. Syst..

[21]  Naixue Xiong,et al.  Interdomain I/O Optimization in Virtualized Sensor Networks , 2018, Sensors.

[22]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[23]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.