BigDataBench-MT: A Benchmark Tool for Generating Realistic Mixed Data Center Workloads

Long-running service workloads (e.g. web search engine) and short-term data analysis workloads (e.g. Hadoop MapReduce jobs) co-locate in today’s data centers. Developing realistic benchmarks to reflect such practical scenario of mixed workload is a key problem to produce trustworthy results when evaluating and comparing data center systems. This requires using actual workloads as well as guaranteeing their submissions to follow patterns hidden in real-world traces. However, existing benchmarks either generate actual workloads based on probability models, or replay real-world workload traces using basic I/O operations. To fill this gap, we propose a benchmark tool that is a first step towards generating a mix of actual service and data analysis workloads on the basis of real workload traces. Our tool includes a combiner that enables the replaying of actual workloads according to the workload traces, and a multi-tenant generator that flexibly scales the workloads up and down according to users’ requirements. Based on this, our demo illustrates the workload customization and generation process using a visual interface. The proposed tool, called BigDataBench-MT, is a multi-tenant version of our comprehensive benchmark suite BigDataBench and it is publicly available from http://prof.ict.ac.cn/BigDataBench/multi-tenancyversion/.

[1]  Dhabaleswar K. Panda,et al.  A Micro-benchmark Suite for Evaluating Hadoop MapReduce on High-Performance Networks , 2014, BPOE@ASPLOS/VLDB.

[2]  Werner Vogels,et al.  Dynamo: amazon's highly available key-value store , 2007, SOSP.

[3]  Raghunath Othayoth Nambiar A standard for benchmarking big data systems , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[4]  Shahram Ghandeharizadeh,et al.  BG: A Benchmark to Evaluate Interactive Social Networking Actions , 2013, CIDR.

[5]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[6]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[7]  S.Suganthi,et al.  Cassandra-A Decentralized Structured Storage System , 2017 .

[8]  Jie Huang,et al.  The HiBench benchmark suite: Characterization of the MapReduce-based data analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[9]  Michael Stonebraker,et al.  A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[10]  Babak Falsafi,et al.  Clearing the Clouds: A Study of Emerging Workloads on Modern Hardware , 2011 .

[11]  Timothy G. Armstrong,et al.  LinkBench: a database benchmark based on the Facebook social graph , 2013, SIGMOD '13.

[12]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[13]  Xiaoyi Lu,et al.  On Big Data Benchmarking , 2014, BPOE@ASPLOS/VLDB.

[14]  Moustafa Ghanem,et al.  Future Generation Computer Systems ( ) – Future Generation Computer Systems Enabling Cost-aware and Adaptive Elasticity of Multi-tier Cloud Applications , 2022 .

[15]  Yuqing Zhu,et al.  BigDataBench: A big data benchmark suite from internet services , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[16]  Prashant Malik,et al.  Cassandra: a decentralized structured storage system , 2010, OPSR.

[17]  Yanpei Chen,et al.  Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads , 2012, Proc. VLDB Endow..

[18]  Vikram A. Saletore,et al.  HcBench: Methodology, development, and characterization of a customer usage representative big data/Hadoop benchmark , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[19]  Alexandru Iosup,et al.  Graphalytics: A Big Data Benchmark for Graph-Processing Platforms , 2015, GRADES@SIGMOD/PODS.

[20]  Chunjie Luo,et al.  BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking , 2013, WBDB.

[21]  Adam Silberstein,et al.  Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[22]  Jianfeng Zhan,et al.  SARP: producing approximate results with small correctness losses for cloud interactive services , 2015, Conf. Computing Frontiers.

[23]  Randy H. Katz,et al.  Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.

[24]  Andrew W. Moore,et al.  X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[25]  Sara Bouchenak,et al.  MRBS: Towards Dependability Benchmarking for Hadoop MapReduce , 2012, Euro-Par Workshops.

[26]  Lei Gao,et al.  Serving large-scale batch computed data with project Voldemort , 2012, FAST.