Throughput Enhancement for Phase Change Memories

Phase Change Memory (PCM) has emerged as a promising candidate for future memories. PCM has high cell density, zero cell leakage, and high stability in deep sub-micron technologies. Although PCM has limited endurance, recent endeavors have shown that its lifetime can be improved by orders of magnitude. However, a major hurdle for PCM is the long write latency and high write power. For this reason, PCM cannot deliver satisfactory memory bandwidth for high-end computing environment such as multi-processing and server systems. In this paper, we develop a non-blocking PCM bank design such that subsequent reads or writes can be carried in parallel with an on-going write. This is effective in removing long blocking time due to serial operations. Moreover, we propose novel memory request scheduling algorithms to exploit intra-bank parallelism brought by our non-blocking hardware. Our non-blocking hardware with scheduling enhancement improves PCM memory throughput by 51% on average. Finally, we propose a fine-grained power budgeting scheme to achieve more throughput improvement under power budgets. Experiments show that our scheduler enhanced with power budgeting scheme can achieve a throughput improvement of 118% on average.

[1]  Moinuddin K. Qureshi,et al.  Morphable memory system: a robust architecture for exploiting multi-level phase change memories , 2010, ISCA.

[2]  Hyunjin Lee,et al.  Flip-N-Write: A simple deterministic technique to improve PRAM write performance, energy and endurance , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[3]  Byung-Gil Choi,et al.  A 0.1-$\mu{\hbox {m}}$ 1.8-V 256-Mb Phase-Change Random Access Memory (PRAM) With 66-MHz Synchronous Burst-Read Operation , 2007, IEEE Journal of Solid-State Circuits.

[4]  Moinuddin K. Qureshi,et al.  Improving read performance of Phase Change Memories via Write Cancellation and Write Pausing , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.

[5]  Yu (Kevin) Cao,et al.  What is Predictive Technology Model (PTM)? , 2009, SIGD.

[6]  Kinam Kim,et al.  Technology for sub-50nm DRAM and NAND flash manufacturing , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..

[7]  Vijayalakshmi Srinivasan,et al.  Enhancing lifetime and security of PCM-based Main Memory with Start-Gap Wear Leveling , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[8]  Zhao Zhang,et al.  Mini-rank: Adaptive DRAM architecture for improving memory power efficiency , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.

[9]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[10]  Y. Agata An 8ns Random Cycle Embedded RAM Macro with Dual-Port Interleaved DRAM Architecture (D2RAM) , 2000 .

[11]  Karin Strauss,et al.  Preventing PCM banks from seizing too much power , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[12]  Onur Mutlu,et al.  Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems , 2008, 2008 International Symposium on Computer Architecture.

[13]  Karthick Rajamani,et al.  Energy Management for Commercial Servers , 2003, Computer.

[14]  Mor Harchol-Balter,et al.  Thread Cluster Memory Scheduling: Exploiting Differences in Memory Access Behavior , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.

[15]  Engin Ipek,et al.  Dynamically replicated memory: building reliable systems from nanoscale resistive memories , 2010, ASPLOS XV.

[16]  Rami G. Melhem,et al.  Increasing PCM main memory lifetime , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[17]  Hye-Jin Kim,et al.  A 90nm 1.8V 512Mb Diode-Switch PRAM with 266MB/s Read Throughput , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[18]  Tajana Simunic,et al.  PDRAM: A hybrid PRAM and DRAM main memory system , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[19]  Milo M. K. Martin,et al.  Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.

[20]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[21]  Ki-Whan Song,et al.  A 58nm 1.8V 1Gb PRAM with 6.4MB/s program BW , 2011, 2011 IEEE International Solid-State Circuits Conference.

[22]  Duane Mills,et al.  A 45nm 1Gb 1.8V phase-change memory , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).

[23]  M TullsenDean,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000 .

[24]  Guido Torelli,et al.  A Bipolar-Selected Phase Change Memory Featuring Multi-Level Cell Storage , 2009, IEEE Journal of Solid-State Circuits.

[25]  Tao Li,et al.  Characterizing and mitigating the impact of process variations on phase change based memory systems , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).

[26]  Christoforos E. Kozyrakis,et al.  Future scaling of processor-memory interfaces , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[27]  Scott Rixner,et al.  Memory Controller Optimizations for Web Servers , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[28]  N. Kuroda,et al.  An 8 ns random cycle embedded RAM macro with dual-port interleaved DRAM architecture (D/sup 2/ RAM) , 2000, 2000 IEEE International Solid-State Circuits Conference. Digest of Technical Papers (Cat. No.00CH37056).

[29]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[30]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[31]  Hsien-Hsin S. Lee,et al.  Security refresh: prevent malicious wear-out and increase durability for phase-change memory with dynamically randomized address mapping , 2010, ISCA.

[32]  Karin Strauss,et al.  Use ECP, not ECC, for hard failures in resistive memories , 2010, ISCA.

[33]  Byung-Gil Choi,et al.  A 90 nm 1.8 V 512 Mb Diode-Switch PRAM With 266 MB/s Read Throughput , 2008, IEEE Journal of Solid-State Circuits.

[34]  Tao Li,et al.  Exploring Phase Change Memory and 3D Die-Stacking for Power/Thermal Friendly, Fast and Durable Memory Architectures , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[35]  Jun Yang,et al.  A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[36]  William J. Dally,et al.  Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).