Hadoop Workloads Characterization for Performance and Energy Efficiency Optimizations on Microservers

The traditional low-power embedded processors such as Atom and ARM are entering into the high-performance server market. At the same time, big data analytics applications are emerging and dramatically changing the landscape of data center workloads. Emerging big data applications require a significant amount of server computational power. However, the rapid growth in the data yields challenges to process them efficiently using current high-performance server architectures. Furthermore, physical design constraints, such as power and density have become the dominant limiting factor for scaling out servers. Numerous big data applications rely on using Hadoop MapReduce framework to perform their analysis on large-scale datasets. Since Hadoop configuration parameters as well as system parameters directly affect the MapReduce job performance and energy-efficiency, joint application, system, and architecture level parameters tuning is vital to maximize the energy efficiency for Hadoop-based applications. In this work, through methodical investigation of performance and power measurements, we demonstrate how the interplay among various Hadoop configuration parameters, as well as system and architecture level parameters affect not only the performance but also the energy-efficiency across various big data applications. Our results identify trends to guide scheduling decision and key insights to help improving Hadoop MapReduce applications performance, power, and energy-efficiency on microservers.

[1]  Kevin Skadron,et al.  Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[2]  Peter A. Boyle,et al.  The BlueGene/Q supercomputer , 2012 .

[3]  Antony I. T. Rowstron,et al.  Scale-up vs scale-out for Hadoop: time to rethink? , 2013, SoCC.

[4]  Avesta Sasan,et al.  2015 Ieee International Conference on Big Data (big Data) System and Architecture Level Characterization of Big Data Applications on Big and Little Core Server Architectures , 2022 .

[5]  Gang Lu,et al.  CloudRank-D: benchmarking and ranking cloud computing systems for data processing applications , 2012, Frontiers of Computer Science.

[6]  Timothy G. Armstrong,et al.  LinkBench: a database benchmark based on the Facebook social graph , 2013, SIGMOD '13.

[7]  Daniel Mossé,et al.  Energy-aware thread co-location in heterogeneous multicore processors , 2013, 2013 Proceedings of the International Conference on Embedded Software (EMSOFT).

[8]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[9]  Luiz André Barroso,et al.  The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines , 2009, The Datacenter as a Computer: An Introduction to the Design of Warehouse-Scale Machines.

[10]  Vanchinathan Venkataramani,et al.  Hierarchical power management for asymmetric multi-core in dark silicon era , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[11]  Karthikeyan Sankaralingam,et al.  Power struggles: Revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[12]  Rajiv V. Joshi,et al.  Characterizing Hadoop applications on microservers for performance and energy efficiency optimizations , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[13]  Alexander S. Szalay,et al.  Hadoop in Low-Power Processors , 2014, ArXiv.

[14]  Xiaowei Yang,et al.  CloudCmp: comparing public cloud providers , 2010, IMC '10.

[15]  Chanwit Kaewkasi,et al.  A study of big data processing constraints on a low-power Hadoop cluster , 2014, 2014 International Computer Science and Engineering Conference (ICSEC).

[16]  Beng Chin Ooi,et al.  A Performance Study of Big Data on Small Nodes , 2015, Proc. VLDB Endow..

[17]  Xiaona Li,et al.  BigDataBench: a Big Data Benchmark Suite from Web Search Engines , 2013, ArXiv.

[18]  Jordi Torres,et al.  GreenHadoop: leveraging green energy in data-processing frameworks , 2012, EuroSys '12.

[19]  Klara Nahrstedt,et al.  Evaluation and Analysis of GreenHDFS: A Self-Adaptive, Energy-Conserving Variant of the Hadoop Distributed File System , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[20]  Luca Benini,et al.  A survey of design techniques for system-level dynamic power management , 2000, IEEE Trans. Very Large Scale Integr. Syst..

[21]  Scott Shenker,et al.  Making Sense of Performance in Data Analytics Frameworks , 2015, NSDI.

[22]  Rini T. Kaushik,et al.  GreenHDFS: towards an energy-conserving, storage-efficient, hybrid Hadoop compute cluster , 2010 .

[23]  Jie Huang,et al.  The HiBench benchmark suite: Characterization of the MapReduce-based data analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[24]  Kushal Datta,et al.  Energy efficient scheduling of MapReduce workloads on heterogeneous clusters , 2011, GCM '11.

[25]  Geoffrey C. Fox,et al.  Investigation of Data Locality in MapReduce , 2012, 2012 12th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (ccgrid 2012).

[26]  David G. Andersen,et al.  Energy-efficient cluster computing with FAWN: workloads and implications , 2010, e-Energy.

[27]  Randy H. Katz,et al.  Improving MapReduce Performance in Heterogeneous Environments , 2008, OSDI.

[28]  Dean M. Tullsen,et al.  Harnessing ISA diversity: Design of a heterogeneous-ISA chip multiprocessor , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[29]  Avesta Sasan,et al.  Big vs little core for energy-efficient Hadoop computing , 2017, Design, Automation & Test in Europe Conference & Exhibition (DATE), 2017.

[30]  Siddharth Garg,et al.  Cherry-picking: Exploiting process variations in dark-silicon homogeneous chip multi-processors , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[31]  Houman Homayoun,et al.  Big data on low power cores: Are low power embedded processors a good fit for the big data workloads? , 2015, 2015 33rd IEEE International Conference on Computer Design (ICCD).

[32]  Michael Bedford Taylor,et al.  Is dark silicon useful? Harnessing the four horsemen of the coming dark silicon apocalypse , 2012, DAC Design Automation Conference 2012.

[33]  Ali Raza Butt,et al.  On the use of microservers in supporting hadoop applications , 2014, 2014 IEEE International Conference on Cluster Computing (CLUSTER).

[34]  David A. Bader,et al.  HPC node performance and energy modeling with the co-location of applications , 2016, The Journal of Supercomputing.