Power and performance of read-write aware Hybrid Caches with non-volatile memories

Caches made of non-volatile memory technologies, such as Magnetic RAM (MRAM) and Phase-change RAM (PRAM), offer dramatically different power-performance characteristics when compared with SRAM-based caches, particularly in the areas of static/dynamic power consumption, read and write access latency and cell density. In this paper, we propose to take advantage of the best characteristics that each technology has to offer through the use of read-write aware Hybrid Cache Architecture (RWHCA) designs, where a single level of cache can be partitioned into read and write regions, each of a different memory technology with disparate read and write characteristics. We explore the potential of hardware support for intra-cache data movement within RWHCA caches. Utilizing a full-system simulator that has been validated against real hardware, we demonstrate that a RWHCA design with a conservative setup can provide a geometric mean 55% power reduction and yet 5% IPC improvement over a baseline SRAM cache design across a collection of 30 workloads. Furthermore, a 2-layer 3D cache stack (3DRWHCA) of high density memory technology with the same chip footprint still gives 10% power reduction and boost performance by 16% IPC improvement over the baseline.

[1]  David A. Bader,et al.  BioPerf: a benchmark suite to evaluate high-performance computer architecture on bioinformatics applications , 2005, IEEE International. 2005 Proceedings of the IEEE Workload Characterization Symposium, 2005..

[2]  Doug Burger,et al.  An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.

[3]  Chung Lam Cell Design Considerations for Phase Change Memory as a Universal Memory , 2008, 2008 International Symposium on VLSI Technology, Systems and Applications (VLSI-TSA).

[4]  Zeshan Chishti,et al.  Optimizing replication, communication, and capacity allocation in CMPs , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[5]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[6]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[7]  R. Bez,et al.  Current status of chalcogenide phase change memory , 2005, 63rd Device Research Conference Digest, 2005. DRC '05..

[8]  David A. Wood,et al.  Managing Wire Delay in Large Chip-Multiprocessor Caches , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[9]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..

[10]  GuptaAnoop,et al.  The SPLASH-2 programs , 1995 .

[11]  Yiran Chen,et al.  Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[12]  M. Hosomi,et al.  A novel nonvolatile memory with spin torque transfer magnetization switching: spin-ram , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..