Multiple Physical Mappings: Dynamic DRAM Channel Sharing and Partitioning

When an OS allocates memory to a process, it implicitly performs long-term scheduling on DRAM resources such as channels and banks: Each mapped page frame allows memory operations to send requests to the channels and DRAM banks which are backing that page frame. The OS should be able to choose between sharing or dedicating resources dynamically -- yet it cannot do that on conventional systems. We observed slowdowns from DRAM interference of up to 36% on our 4-core prototype platform for some combinations of workloads, caused by the uncontrolled sharing of DRAM channels in the typical configuration of channel interleaving. Previous work proposed channel partitioning to mitigate that interference, but thereby reduces maximum throughput for individual applications even when workloads do not interfere. With our approach, we enable the OS to choose between channel interleaving and partitioning at run-time, at the granularity of address space (AS) segments. For that purpose, we map DRAM into the physical AS multiple times, as one dedicated region per channel for partitioning and then as another region that interleaves all channels. We implement this approach on commodity hardware. We change the OS's memory management so that we can dedicate channels to processes or share channels between processes with interleaving by choosing page frames from the appropriate region. As a result, we can switch to the configuration that achieves optimum execution speed and system throughput at application run-time (e.g., when workloads change), whereas a conventional system would have to choose interleaving or partitioning while booting.

[1]  Onur Mutlu,et al.  MISE: Providing performance predictability and improving fairness in shared main memory systems , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).

[2]  Dam Sunwoo,et al.  Balancing DRAM locality and parallelism in shared memory CMP systems , 2012, IEEE International Symposium on High-Performance Comp Architecture.

[3]  Timothy Roscoe,et al.  Formalizing Memory Accesses and Interrupts , 2017, MARS@ETAPS.

[4]  Dejan S. Milojicic,et al.  Not Your Parents' Physical Address Space , 2015, HotOS.

[5]  Jochen Liedtke,et al.  OS-controlled cache predictability for real-time systems , 1997, Proceedings Third IEEE Real-Time Technology and Applications Symposium.

[6]  Onur Mutlu,et al.  Research Problems and Opportunities in Memory Systems , 2014, Supercomput. Front. Innov..

[7]  Hyunwoo Choi,et al.  PIkit: A New Kernel-Independent Processor-Interconnect Rootkit , 2016, USENIX Security Symposium.

[8]  Onur Mutlu,et al.  The application slowdown model: Quantifying and controlling the impact of inter-application interference at shared caches and main memory , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[9]  Congfeng Jiang,et al.  PseudoNUMA for reducing memory interference in multi-core systems , 2014, SpringSim.

[10]  Aamer Jaleel,et al.  DReAM: Dynamic Re-arrangement of Address Mapping to Improve the Performance of DRAMs , 2015, MEMSYS.

[11]  Lei Liu,et al.  A software memory partition approach for eliminating bank-level interference in multicore systems , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).

[12]  Xiaobing Feng,et al.  Software-Hardware Cooperative DRAM Bank Partitioning for Chip Multiprocessors , 2010, NPC.

[13]  Mor Harchol-Balter,et al.  Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[14]  Xu Cheng,et al.  Improving system throughput and fairness simultaneously in shared memory CMP systems via Dynamic Bank Partitioning , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).

[15]  Puneet Gupta,et al.  X-Mem: A cross-platform and extensible memory characterization tool for the cloud , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).

[16]  Onur Mutlu,et al.  BLISS: Balancing Performance, Fairness and Complexity in Memory Access Scheduling , 2016, IEEE Transactions on Parallel and Distributed Systems.

[17]  Engin Ipek,et al.  PARDIS: A programmable memory controller for the DDRx interfacing standards , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[18]  Nick Knupffer Intel Corporation , 2018, The Grants Register 2019.

[19]  Rodolfo Pellizzoni,et al.  PALLOC: DRAM bank-aware memory allocator for performance isolation on multicore platforms , 2014, 2014 IEEE 19th Real-Time and Embedded Technology and Applications Symposium (RTAS).

[20]  Christian Bienia,et al.  Benchmarking modern multiprocessors , 2011 .

[21]  Bruce Jacob,et al.  Memory Systems: Cache, DRAM, Disk , 2007 .

[22]  Sai Prashanth Muralidhara,et al.  Reducing memory interference in multicore systems via application-aware memory channel partitioning , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[23]  Lei Liu,et al.  BPM/BPM+: Software-based dynamic memory partitioning mechanisms for mitigating DRAM bank-/channel-level interferences in multicore systems , 2014, TACO.

[24]  Manuel Prieto,et al.  Survey of scheduling techniques for addressing shared resources in multicore processors , 2012, CSUR.

[25]  Vivien Quéma,et al.  Large Pages May Be Harmful on NUMA Systems , 2014, USENIX Annual Technical Conference.

[26]  Zhen Fang,et al.  The Impulse Memory Controller , 2001, IEEE Trans. Computers.

[27]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[28]  David Eklov,et al.  Bandwidth bandit: quantitative characterization of memory contention , 2012, PACT 2012.

[29]  David Eklov,et al.  Bandwidth Bandit: Quantitative characterization of memory contention , 2012, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).