BALM: QoS-Aware Memory Bandwidth Partitioning for Multi-Socket Cloud Nodes

The recent emergence of novel hardware-based resource partitioning mechanisms has unveiled the opportunity for a new generation of QoS-aware resource allocation approaches for workload consolidation. Still, to the best of our knowledge, existing proposals are, by design, not tailored to the growing prevalence of multi-socket systems in contemporary warehouse-scale data centers. We propose BALM, a QoS-aware memory bandwidth allocation technique for multi-socket architectures that combines commodity bandwidth allocation mechanisms with a novel adaptive cross-socket page migration scheme. Our experimental evaluation with real applications on a dual-socket machine shows that BALM can overcome the efficiency limitations of state-of-the-art. BALM can ensure marginal SLO violation windows while delivering up to 87% throughput gains to bandwidth-intensive best-effort applications when compared to state-of-the-art alternatives.

[1]  Christina Delimitrou,et al.  PARTIES: QoS-Aware Resource Partitioning for Multiple Interactive Services , 2019, ASPLOS.

[2]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[3]  Xiao Zhang,et al.  Optimizing Google's warehouse scale computers: The NUMA experience , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[4]  Vladimir Vlassov,et al.  Bandwidth-Aware Page Placement in NUMA , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium (IPDPS).

[5]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[6]  Tirthak Patel,et al.  CLITE: Efficient and QoS-Aware Co-Location of Multiple Latency-Critical Jobs for Warehouse Scale Computers , 2020, 2020 IEEE International Symposium on High Performance Computer Architecture (HPCA).

[7]  Sherief Reda,et al.  Pack & Cap: Adaptive DVFS and thread packing under power caps , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[8]  David H. Bailey,et al.  The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).

[9]  Joshua Fried,et al.  Caladan: Mitigating Interference at Microsecond Timescales , 2020, OSDI.

[10]  Brad Fitzpatrick,et al.  Distributed caching with memcached , 2004 .