Fine-grained write scheduling for PCM performance improvement under write power budget

Phase-change memory (PCM) has gained much attention recently since it offers several advantages over DRAM, such as high cell density and low leakage power. PCM has similar read power and latency as DRAM; however, its write power and latency are significantly higher than DRAM. Therefore, one challenge with PCM is how to increase write throughput under write power budget constraints. To increase write concurrency, PCM often adopts division programming, where a write occurs in a series of divisions, so that writes to different banks proceed concurrently. In this study, we observe that since the write scheduling granularity in the memory controller differs from the actual write granularity in PCM chips, i.e., requests vs. divisions, the available power budget cannot be fully utilized. We therefore propose enhancing the interface between the memory controller and PCM chips to allow the memory controller to schedule writes in the division granularity. To further increase power budget utilization, we design a variable-length division mechanism to allow the division granularity to be adjusted at runtime according to the available write power budget. Our experimental results show that these techniques improve system performance by up to 65%.

[1]  Matt T. Yourst PTLsim: A Cycle Accurate Full System x86-64 Microarchitectural Simulator , 2007, 2007 IEEE International Symposium on Performance Analysis of Systems & Software.

[2]  Karin Strauss,et al.  Preventing PCM banks from seizing too much power , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[3]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[4]  Fabrice Bellard,et al.  QEMU, a Fast and Portable Dynamic Translator , 2005, USENIX ATC, FREENIX Track.

[5]  Rami G. Melhem,et al.  Bit mapping for balanced PCM cell programming , 2013, ISCA.

[6]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[7]  Bruce Jacob,et al.  DRAMSim2: A Cycle Accurate Memory System Simulator , 2011, IEEE Computer Architecture Letters.

[8]  Hyunjin Lee,et al.  Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[9]  John L. Henning SPEC CPU2006 benchmark descriptions , 2006, CARN.

[10]  Youtao Zhang,et al.  Throughput Enhancement for Phase Change Memories , 2014, IEEE Transactions on Computers.

[11]  Jun Yang,et al.  A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[12]  Luis A. Lastras,et al.  PreSET: Improving performance of phase change memories by exploiting asymmetry in write times , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).

[13]  Shunfei Chen,et al.  MARSS: A full system simulator for multicore x86 CPUs , 2011, 2011 48th ACM/EDAC/IEEE Design Automation Conference (DAC).

[14]  Qi Wang,et al.  A 20nm 1.8V 8Gb PRAM with 40MB/s program bandwidth , 2012, 2012 IEEE International Solid-State Circuits Conference.