Efficient remapping mechanisms for an adaptable memory system

The speed gap between processors and memory continues to widen. This problem has led to an increased reliance on complex cache hierarchies. Caches are very effective for programs with near 100% cache hit rates, but they fail on many important applications that do not exhibit sufficient data locality. In the same vein, TLBs fail in their role of hiding the latency of virtual-to-physical address translation. A TLB typically contains between 64 and 512 entries, which leads to low TLB hit rates for applications with poor data locality and large working sets. We propose to attack these problems by using an extra level of address translation at the main memory controller—an idea first introduced by Swanson, Stoller, and Carter [93]. Remapping physical addresses at the memory controller allows data structures with poor locality to be reorganized into data structures with high locality without copying, which can significantly increase TLB and cache hit rates. This dissertation investigates the effectiveness of physical address remapping at the memory controller by exploring five efficient remapping mechanisms that can be categorized into three families: remapping strided accesses, remapping through an indirection vector, and remapping-based page coloring. Remapping strided accesses reorganizes sparse data items distributed along fixed strides into dense regions. We consider two different strided remapping algorithms: stride remapping creates dense cache lines from data items whose virtual addresses are distributed along a nonunit stride; and transpose remapping creates the transpose of a two-dimensional matrix. Remapping through an indirection vector packs dense cache lines from disjoint memory locations according to addresses stored in an indirection vector. It enables application programs to access arbitrarily distributed data items as if they were stored sequentially. We consider both static and dynamic forms of this remapping. The static form requires that the indirection vector exists naturally in the program. The dynamic form dynamically creates a small indirection vector. Remapping-based page coloring performs coarse-grained remapping at the memory controller. It recolors physical pages without copying. This dissertation explores different ways to implement these remapping mechanisms and compares them with conventional data reorganization methods that employ copying or a relocation buffer. We evaluate the performance of these remapping mechanisms on a suite of benchmarks that do not perform well on conventional memory systems. This evaluation includes both execution-driven simulation and analytical modeling. The results show that address remapping at the memory controller improves the performance of these benchmarks by an average of 124%. Remapping outperforms data reorganization via copying by an average of 66% and reorganization via a relocation buffer by an average of 61%.