Power Characterization of Memory Intensive Applications: Analysis and Implications

DRAM is a significant source of server power consumption especially when the server runs memory intensive applications. Current power aware scheduling assumes that DRAM is as energy proportional as other components. However, the non-energy proportionality of DRAM significantly affects the power and energy consumption of the whole server system when running memory intensive applications. Thus good knowledge of server power characterization under memory intensive workloads can help better workload placement with power reduction. In this paper, we investigate the power characteristics of memory intensive applications on real rack servers of different generations. Through comprehensive analysis we find that (1) Server power consumption changes with workload intensity and concurrent execution threads. However, fully utilized memory systems are not the most energy efficient. (2) Powered memory modules of installed memory capacity, i.e. the memory capacity per processor core has significant impact on the application’s performance and server power consumption even if the memory system is not fully utilized. (3) Memory utilization is not always a good indicator for server power consumption when it is running memory intensive applications. Our experiments show that hardware configuration, workload types, as well as concurrently running threads have significant impact on a server’s energy efficiency when running memory intensive applications. Our findings presented in this paper provide useful insights and guidance to system designers, as well as data center operators for energy efficiency aware job scheduling and power reductions.

[1]  Gang Chen,et al.  epiC: an Extensible and Scalable System for Processing Big Data , 2014, Proc. VLDB Endow..

[2]  Kaushik Roy,et al.  Ultra low power associative computing with spin neurons and resistive crossbar memory , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[3]  Beng Chin Ooi,et al.  In-Memory Big Data Management and Processing: A Survey , 2015, IEEE Transactions on Knowledge and Data Engineering.

[4]  Masami Takata,et al.  A memory accelerator with gather functions for bandwidth-bound irregular applications , 2011, IA3 '11.

[5]  Kimberly Keeton,et al.  Memory-Driven Computing , 2017, FAST.

[6]  Henk Corporaal,et al.  Skeleton-based design and simulation flow for Computation-in-Memory architectures , 2016, 2016 IEEE/ACM International Symposium on Nanoscale Architectures (NANOARCH).

[7]  Mattan Erez,et al.  A QoS-aware memory controller for dynamically balancing GPU and CPU bandwidth use in an MPSoC , 2012, DAC Design Automation Conference 2012.

[8]  Weisong Shi,et al.  Energy Proportional Servers: Where Are We in 2016? , 2017, 2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS).

[9]  Naixue Xiong,et al.  Interdomain I/O Optimization in Virtualized Sensor Networks , 2018, Sensors.

[10]  Mor Harchol-Balter,et al.  ATLAS : A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers , 2010 .

[11]  Guangjie Han,et al.  Characteristics of Co-Allocated Online Services and Batch Jobs in Internet Data Centers: A Case Study From Alibaba Cloud , 2019, IEEE Access.

[12]  Robert B. Ross,et al.  FusionFS: Toward supporting data-intensive scientific applications on extreme-scale high-performance computing systems , 2014, 2014 IEEE International Conference on Big Data (Big Data).

[13]  Tajana Simunic,et al.  ReMAM: Low energy Resistive Multi-stage Associative Memory for energy efficient computing , 2016, 2016 17th International Symposium on Quality Electronic Design (ISQED).

[14]  Robert Grimm,et al.  A catalog of stream processing optimizations , 2014, ACM Comput. Surv..

[15]  Sai Prashanth Muralidhara,et al.  Reducing memory interference in multicore systems via application-aware memory channel partitioning , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[16]  Mahmut T. Kandemir,et al.  Scheduling techniques for GPU architectures with processing-in-memory capabilities , 2016, 2016 International Conference on Parallel Architecture and Compilation Techniques (PACT).

[17]  Henk Corporaal,et al.  Skeleton-Based Synthesis Flow for Computation-in-Memory Architectures , 2020, IEEE Transactions on Emerging Topics in Computing.

[18]  Jack J. Dongarra,et al.  Exascale computing and big data , 2015, Commun. ACM.

[19]  Jung Ho Ahn,et al.  Chameleon: Versatile and practical near-DRAM acceleration architecture for large memory systems , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[20]  Hao Yu,et al.  An ultralow-power memory-based big-data computing platform by nonvolatile domain-wall nanowire devices , 2013, International Symposium on Low Power Electronics and Design (ISLPED).

[21]  Vincent Nélis,et al.  A framework for memory contention analysis in multi-core platforms , 2015, Real-Time Systems.

[22]  Henk Corporaal,et al.  Memristor based computation-in-memory architecture for data-intensive applications , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[23]  Tejas Karkhanis,et al.  Active Memory Cube: A processing-in-memory architecture for exascale systems , 2015, IBM J. Res. Dev..

[24]  Lieven Eeckhout,et al.  Trends in Server Energy Proportionality , 2011, Computer.

[25]  Yumei Wang,et al.  Energy Aware Virtual Machine Scheduling in Data Centers , 2019, Energies.

[26]  Mattan Erez,et al.  Dirigent: Enforcing QoS for Latency-Critical Tasks on Shared Multicore Systems , 2016 .

[27]  Houman Homayoun,et al.  Adaptive Bandwidth Management for Performance-Temperature Trade-offs in Heterogeneous HMC+DDRx Memory , 2015, ACM Great Lakes Symposium on VLSI.

[28]  Feifei Li,et al.  Comparing Implementations of Near-Data Computing with In-Memory MapReduce Workloads , 2014, IEEE Micro.

[29]  Tao Li,et al.  Power-performance co-optimization of throughput core architecture using resistive memory , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[30]  Scott Shenker,et al.  Spark: Cluster Computing with Working Sets , 2010, HotCloud.

[31]  Anil Kumar,et al.  Workload Characterization of the SPECpower_ssj2008 Benchmark , 2008, SIPEW.

[32]  Weisong Shi,et al.  Energy efficiency comparison of hypervisors , 2019, Sustain. Comput. Informatics Syst..

[33]  Krishna M. Kavi,et al.  Improving Node-Level MapReduce Performance Using Processing-in-Memory Technologies , 2014, Euro-Par Workshops.

[34]  Kiyoung Choi,et al.  A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).