Co-locating and concurrent fine-tuning MapReduce applications on microservers for energy efficiency

Datacenters provide flexibility and high performance for users and cost efficiency for operators. However, the high computational demands of big data and analytics technologies such as MapReduce, a dominant programming model and framework for big data analytics, mean that even small changes in the efficiency of execution in the data center can have a large effect on user cost and operational cost. Fine-tuning configuration parameters of MapReduce applications at the application, architecture, and system levels plays a crucial role in improving the energy-efficiency of the server and reducing the operational cost. In this work, through methodical investigation of performance and power measurements, we demonstrate how the interplay among various MapReduce configurations as well as application and architecture level parameters create new opportunities to co-locate MapReduce applications at the node level. We also show how concurrently fine-tuning optimization parameters for multiple scheduled MapReduce applications improves energy-efficiency compared to fine-tuning parameters for each application separately. In this paper, we present Co-Located Application Optimization (COLAO) that co-schedules multiple MapReduce applications at the node level to enhance energy efficiency. Our results show that through co-locating MapReduce applications and fine-tuning configuration parameters concurrently, COLAO reduces the number of nodes by half to execute MapReduce applications while improving the EDP by 2.2X on average, compared to fine-tuning applications individually and run them serially for a broad range of studied workloads.

[1]  Jie Chen,et al.  Analysis and approximation of optimal co-scheduling on Chip Multiprocessors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[2]  Sally A. McKee,et al.  Characterizing and subsetting big data workloads , 2014, 2014 IEEE International Symposium on Workload Characterization (IISWC).

[3]  Lingjia Tang,et al.  Bubble-flux: precise online QoS management for increased utilization in warehouse scale computers , 2013, ISCA.

[4]  Fan Zhang,et al.  A characterization of big data benchmarks , 2013, 2013 IEEE International Conference on Big Data.

[5]  Houman Homayoun,et al.  Accelerating Machine Learning Kernel in Hadoop Using FPGAs , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[6]  Ali Raza Butt,et al.  On the use of microservers in supporting hadoop applications , 2014, 2014 IEEE International Conference on Cluster Computing (CLUSTER).

[7]  Ayse K. Coskun,et al.  Energy-efficient server consolidation for multi-threaded applications in the cloud , 2013, 2013 International Green Computing Conference Proceedings.

[8]  Sally A. McKee,et al.  An approach to resource-aware co-scheduling for CMPs , 2010, ICS '10.

[9]  Margo I. Seltzer,et al.  Performance of Multithreaded Chip Multiprocessors and Implications for Operating System Design , 2005, USENIX Annual Technical Conference, General Track.

[10]  Geoff Holmes,et al.  Generating Rule Sets from Model Trees , 1999, Australian Joint Conference on Artificial Intelligence.

[11]  Yuqing Zhu,et al.  BigDataBench: A big data benchmark suite from internet services , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[12]  Karthikeyan Sankaralingam,et al.  Power struggles: Revisiting the RISC vs. CISC debate on contemporary ARM and x86 architectures , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[13]  Christina Delimitrou,et al.  Quasar: resource-efficient and QoS-aware cluster management , 2014, ASPLOS.

[14]  Kevin Skadron,et al.  Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[15]  Mary Lou Soffa,et al.  Characterizing multi-threaded applications based on shared-resource contention , 2011, (IEEE ISPASS) IEEE INTERNATIONAL SYMPOSIUM ON PERFORMANCE ANALYSIS OF SYSTEMS AND SOFTWARE.

[16]  Beng Chin Ooi,et al.  A Performance Study of Big Data on Small Nodes , 2015, Proc. VLDB Endow..

[17]  Rudy Lauwereins,et al.  Design, Automation, and Test in Europe , 2008 .

[18]  Chia-Ming Wu,et al.  A green energy-efficient scheduling algorithm using the DVFS technique for cloud datacenters , 2014, Future Gener. Comput. Syst..

[19]  Dick H. J. Epema,et al.  Towards Machine Learning-Based Auto-tuning of MapReduce , 2013, 2013 IEEE 21st International Symposium on Modelling, Analysis and Simulation of Computer and Telecommunication Systems.

[20]  Yan Solihin,et al.  Predicting inter-thread cache contention on a chip multi-processor architecture , 2005, 11th International Symposium on High-Performance Computer Architecture.

[21]  Lingjia Tang,et al.  The impact of memory subsystem resource sharing on datacenter applications , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).

[22]  Jie Huang,et al.  The HiBench benchmark suite: Characterization of the MapReduce-based data analysis , 2010, 2010 IEEE 26th International Conference on Data Engineering Workshops (ICDEW 2010).

[23]  Alexandra Fedorova,et al.  Base Vectors : A Potential Technique for Micro-architectural Classification of Applications , 2007 .

[24]  Aamer Jaleel,et al.  CRUISE: cache replacement and utility-aware scheduling , 2012, ASPLOS XVII.

[25]  Jesuk Ko,et al.  A symbiotic evolutionary algorithm for the integration of process planning and job shop scheduling , 2003, Comput. Oper. Res..

[26]  Christina Delimitrou,et al.  Paragon: QoS-aware scheduling for heterogeneous datacenters , 2013, ASPLOS '13.

[27]  Nam Sung Kim,et al.  SleepScale: Runtime joint speed scaling and sleep states management for power efficient data centers , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).

[28]  Mahmut Sami Aktasoglu A Workload Mapping Method For Multicoresystems Using Cross-run Statistics , 2012 .

[29]  Thomas Lundqvist,et al.  Addressing characterization methods for memory contention aware co-scheduling , 2014, The Journal of Supercomputing.

[30]  Hari Angepat,et al.  A cloud-scale acceleration architecture , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[31]  Hassan Ghasemzadeh,et al.  Big vs little core for energy-efficient Hadoop computing , 2019, J. Parallel Distributed Comput..

[32]  Timothy G. Armstrong,et al.  LinkBench: a database benchmark based on the Facebook social graph , 2013, SIGMOD '13.

[33]  Archana Ganapathi,et al.  Predicting and Optimizing System Utilization and Performance via Statistical Machine Learning , 2009 .

[34]  Soonwook Hwang,et al.  Platform and Co-Runner Affinities for Many-Task Applications in Distributed Computing Platforms , 2015, 2015 15th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing.

[35]  Xiaowei Yang,et al.  CloudCmp: comparing public cloud providers , 2010, IMC '10.

[36]  Babak Falsafi,et al.  Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.

[37]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[38]  Kevin Skadron,et al.  A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).

[39]  Y. Zhao,et al.  Comparison of decision tree methods for finding active objects , 2007, 0708.4274.

[40]  Eric S. Chung,et al.  LINQits: big data on little clients , 2013, ISCA.

[41]  Gabriel H. Loh,et al.  Dynamic Classification of Program Memory Behaviors in CMPs , 2008 .

[42]  Houman Homayoun,et al.  Big data on low power cores: Are low power embedded processors a good fit for the big data workloads? , 2015, 2015 33rd IEEE International Conference on Computer Design (ICCD).

[43]  Tong Li,et al.  Using OS Observations to Improve Performance in Multicore Systems , 2008, IEEE Micro.

[44]  Ali Raza Butt,et al.  [phi]Sched: A Heterogeneity-Aware Hadoop Workflow Scheduler , 2014, 2014 IEEE 22nd International Symposium on Modelling, Analysis & Simulation of Computer and Telecommunication Systems.