Contention-aware application performance prediction for disaggregated memory systems

Disaggregated memory has recently been proposed as a way to allow flexible and fine-grained allocation of memory capacity to compute jobs. This paper makes an important step towards effective resource allocation on disaggregated memory systems. Specifically, we propose a generic approach to predict the performance degradation due to sharing of disaggregated memory. In contrast to prior work, cache capacity is not shared among multiple applications, which removes a major contributor to application performance. For this reason, our analysis is driven by the demand for memory bandwidth, which has been shown to have an important effect on application performance. We show that profiling the application slowdown often involves significant experimental error and noise, and to this end, we improve the accuracy by linear smoothing of the sensitivity curves. We also show that contention is sensitive to the ratio between read and write memory accesses, and we address this sensitivity by building a family of sensitivity curves according to the read/write ratios. Our results show that the methodology predicts the slowdown in application performance subject to memory contention with an average error of 1.19% and max error of 14.6%. Compared with state-of-the-art, the relative improvements are almost 24% on average and 33% for the worst case.

[1]  David Eklov,et al.  Bandwidth Bandit: Quantitative characterization of memory contention , 2012, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).

[3]  Thomas F. Wenisch,et al.  System-level implications of disaggregated memory , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[4]  Reza Nejabati,et al.  The Benefits of a Disaggregated Data Centre: A Resource Allocation Approach , 2016, 2016 IEEE Global Communications Conference (GLOBECOM).

[5]  Robert Schöne,et al.  Main memory and cache performance of intel sandy bridge and AMD bulldozer , 2014, MSPC@PLDI.

[6]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[7]  Yiying Zhang,et al.  LegoOS: A Disseminated, Distributed OS for Hardware Resource Disaggregation , 2018, OSDI.

[8]  José Duato,et al.  Perf&Fair: A Progress-Aware Scheduler to Enhance Performance and Fairness in SMT Multicores , 2017, IEEE Transactions on Computers.

[9]  Kostas Katrinis,et al.  A Software-defined SoC Memory Bus Bridge Architecture for Disaggregated Computing , 2018, AISTECS@HiPEAC.

[10]  Holger Fröning,et al.  MEMSCALE™: A Scalable Environment for Databases , 2011, 2011 IEEE International Conference on High Performance Computing and Communications.

[11]  Kevin Skadron,et al.  Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).

[12]  Kang G. Shin,et al.  Efficient Memory Disaggregation with Infiniswap , 2017, NSDI.

[13]  Kostas Katrinis,et al.  A software-defined architecture and prototype for disaggregated memory rack scale systems , 2017, 2017 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation (SAMOS).

[14]  Paul M. Carpenter,et al.  PROFET: Modeling System Performance and Energy Without Simulating the CPU , 2019, SIGMETRICS.

[15]  Paul M. Carpenter,et al.  EUROSERVER: Energy Efficient Node for European Micro-Servers , 2014, 2014 17th Euromicro Conference on Digital System Design.

[16]  Purushottam Kulkarni,et al.  DiME: A Performance Emulator for Disaggregated Memory Architectures , 2017, APSys.

[17]  No License,et al.  Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .

[18]  Paul M. Carpenter,et al.  Main Memory in HPC , 2017, ACM Trans. Archit. Code Optim..

[20]  José Duato,et al.  An empirical model for predicting cross-core performance interference on multicore processors , 2013, PACT 2013.

[21]  David H. Bailey,et al.  The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[22]  Thomas F. Wenisch,et al.  Disaggregated memory for expansion and sharing in blade servers , 2009, ISCA '09.

[23]  Onur Mutlu,et al.  The application slowdown model: Quantifying and controlling the impact of inter-application interference at shared caches and main memory , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[24]  Kevin Skadron,et al.  Bubble-up: Increasing utilization in modern warehouse scale computers via sensible co-locations , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[25]  Amro Awad,et al.  Exploring Allocation Policies in Disaggregated Non-Volatile Memories , 2018, MCHPC@SC.

[26]  Luca Benini,et al.  Paving the Way Towards a Highly Energy-Efficient and Highly Integrated Compute Node for the Exascale Revolution: The ExaNoDe Approach , 2017, 2017 Euromicro Conference on Digital System Design (DSD).

[27]  Kostas Katrinis,et al.  dReDBox: Materializing a full-stack rack-scale system prototype of a next-generation disaggregated datacenter , 2018, 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[28]  Thomas R. Gross,et al.  Memory system performance in a NUMA multicore multiprocessor , 2011, SYSTOR '11.

[29]  Daniel Mossé,et al.  Intelligent Colocation of Workloads for Enhanced Server Efficiency , 2019, 2019 31st International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).

[30]  Greg Bronevetsky,et al.  Evaluation of HPC Applications’ Memory Resource Consumption via Active Measurement , 2016, IEEE Transactions on Parallel and Distributed Systems.

[31]  Xiaobing Feng,et al.  Predicting Cross-Core Performance Interference on Multicore Processors with Regression Analysis , 2016, IEEE Transactions on Parallel and Distributed Systems.

[32]  Javier Aracil,et al.  Diluting the Scalability Boundaries: Exploring the Use of Disaggregated Architectures for High-Level Network Data Analysis , 2017, 2017 IEEE 19th International Conference on High Performance Computing and Communications; IEEE 15th International Conference on Smart City; IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS).

[33]  G. Zervas,et al.  dRedDbox: Demonstrating Disaggregated Memory in an Optical Data Centre , 2018, 2018 Optical Fiber Communications Conference and Exposition (OFC).

[34]  Xiaolang Yan,et al.  Providing Predictable Performance via a Slowdown Estimation Model , 2017, ACM Trans. Archit. Code Optim..

[35]  Wolfgang E. Nagel,et al.  Detecting Memory-Boundedness with Hardware Performance Counters , 2017, ICPE.

[36]  Stefanos Kaxiras,et al.  Splash-3: A properly synchronized benchmark suite for contemporary research , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[37]  Thomas Lundqvist,et al.  A methodology for estimating co-scheduling slowdowns due to memory bus contention on multicore nodes , 2014 .

[38]  Onur Mutlu,et al.  MISE: Providing performance predictability and improving fairness in shared main memory systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[39]  Georgios Zervas,et al.  Optically disaggregated data centers with minimal remote memory latency: Technologies, architectures, and resource allocation [Invited] , 2018, IEEE/OSA Journal of Optical Communications and Networking.

[40]  Thomas Lundqvist,et al.  Addressing characterization methods for memory contention aware co-scheduling , 2014, The Journal of Supercomputing.

[41]  George Porter,et al.  Is memory disaggregation feasible? A case study with Spark SQL , 2016, 2016 ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS).

[42]  Chung-Sheng Li,et al.  Disaggregated and optically interconnected memory: when will it be cost effective? , 2015, ArXiv.

[43]  David Eklov,et al.  Cache Pirating: Measuring the Curse of the Shared Cache , 2011, 2011 International Conference on Parallel Processing.