论文信息 - BigDataBench-MT: A Benchmark Tool for Generating Realistic Mixed Data Center Workloads

BigDataBench-MT: A Benchmark Tool for Generating Realistic Mixed Data Center Workloads

Long-running service workloads (e.g. web search engine) and short-term data analysis workloads (e.g. Hadoop MapReduce jobs) co-locate in today’s data centers. Developing realistic benchmarks to reflect such practical scenario of mixed workload is a key problem to produce trustworthy results when evaluating and comparing data center systems. This requires using actual workloads as well as guaranteeing their submissions to follow patterns hidden in real-world traces. However, existing benchmarks either generate actual workloads based on probability models, or replay real-world workload traces using basic I/O operations. To fill this gap, we propose a benchmark tool that is a first step towards generating a mix of actual service and data analysis workloads on the basis of real workload traces. Our tool includes a combiner that enables the replaying of actual workloads according to the workload traces, and a multi-tenant generator that flexibly scales the workloads up and down according to users’ requirements. Based on this, our demo illustrates the workload customization and generation process using a visual interface. The proposed tool, called BigDataBench-MT, is a multi-tenant version of our comprehensive benchmark suite BigDataBench and it is publicly available from http://prof.ict.ac.cn/BigDataBench/multi-tenancyversion/.

[1] Dhabaleswar K. Panda,et al. A Micro-benchmark Suite for Evaluating Hadoop MapReduce on High-Performance Networks , 2014, BPOE@ASPLOS/VLDB.

[2] Werner Vogels,et al. Dynamo: amazon's highly available key-value store , 2007, SOSP.

[3] Raghunath Othayoth Nambiar. A standard for benchmarking big data systems , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[4] Shahram Ghandeharizadeh,et al. BG: A Benchmark to Evaluate Interactive Social Networking Actions , 2013, CIDR.

[5] Luiz André Barroso,et al. The tail at scale , 2013, CACM.

[6] P. Cochat,et al. Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[7] S.Suganthi,et al. Cassandra-A Decentralized Structured Storage System , 2017 .

[8] Jie Huang,et al. The HiBench benchmark suite: Characterization of the MapReduce-based data analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[9] Michael Stonebraker,et al. A comparison of approaches to large-scale data analysis , 2009, SIGMOD Conference.

[10] Babak Falsafi,et al. Clearing the Clouds: A Study of Emerging Workloads on Modern Hardware , 2011 .

[11] Timothy G. Armstrong,et al. LinkBench: a database benchmark based on the Facebook social graph , 2013, SIGMOD '13.

[12] Babak Falsafi,et al. Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[13] Xiaoyi Lu,et al. On Big Data Benchmarking , 2014, BPOE@ASPLOS/VLDB.

[14] Moustafa Ghanem,et al. Future Generation Computer Systems ( ) – Future Generation Computer Systems Enabling Cost-aware and Adaptive Elasticity of Multi-tier Cloud Applications , 2022 .

[15] Yuqing Zhu,et al. BigDataBench: A big data benchmark suite from internet services , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[16] Prashant Malik,et al. Cassandra: a decentralized structured storage system , 2010, OPSR.

[17] Yanpei Chen,et al. Interactive Analytical Processing in Big Data Systems: A Cross-Industry Study of MapReduce Workloads , 2012, Proc. VLDB Endow..

[18] Vikram A. Saletore,et al. HcBench: Methodology, development, and characterization of a customer usage representative big data/Hadoop benchmark , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).

[19] Alexandru Iosup,et al. Graphalytics: A Big Data Benchmark for Graph-Processing Platforms , 2015, GRADES@SIGMOD/PODS.

[20] Chunjie Luo,et al. BDGS: A Scalable Big Data Generator Suite in Big Data Benchmarking , 2013, WBDB.

[21] Adam Silberstein,et al. Benchmarking cloud serving systems with YCSB , 2010, SoCC '10.

[22] Jianfeng Zhan,et al. SARP: producing approximate results with small correctness losses for cloud interactive services , 2015, Conf. Computing Frontiers.

[23] Randy H. Katz,et al. Heterogeneity and dynamicity of clouds at scale: Google trace analysis , 2012, SoCC '12.

[24] Andrew W. Moore,et al. X-means: Extending K-means with Efficient Estimation of the Number of Clusters , 2000, ICML.

[25] Sara Bouchenak,et al. MRBS: Towards Dependability Benchmarking for Hadoop MapReduce , 2012, Euro-Par Workshops.

[26] Lei Gao,et al. Serving large-scale batch computed data with project Voldemort , 2012, FAST.