论文信息 - Compiler-assisted preferred caching for embedded systems with STT-RAM based hybrid cache

Compiler-assisted preferred caching for embedded systems with STT-RAM based hybrid cache

As technology scales down, energy consumption is becoming a big problem for traditional SRAM-based cache hierarchies. The emerging Spin-Torque Transfer RAM (STT-RAM) is a promising replacement for large on-chip cache due to its ultra low leakage power and high storage density. However, write operations on STT-RAM suffer from considerably higher energy consumption and longer latency than SRAM. Hybrid cache consisting of both SRAM and STT-RAM has been proposed recently for both performance and energy efficiency. Most management strategies for hybrid caches employ migration-based techniques to dynamically move write-intensive data from STT-RAM to SRAM. These techniques lead to extra overheads. In this paper, we propose a compiler-assisted approach, preferred caching, to significantly reduce the migration overhead by giving migration-intensive memory blocks the preference for the SRAM part of the hybrid cache. Furthermore, a data assignment technique is proposed to improve the efficiency of preferred caching. The reduction of migration overhead can in turn improve the performance and energy efficiency of STT-RAM based hybrid cache. The experimental results show that, with the proposed techniques, on average, the number of migrations is reduced by 21.3%, the total latency is reduced by 8.0% and the total dynamic energy is reduced by 10.8%.

Yanxiang He | Chun Jason Xue | Mengying Zhao | Qing'an Li

[1] Amin Jadidi,et al. High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[2] Trevor Mudge,et al. MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[3] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[4] Dror Rawitz,et al. The hardness of cache conscious data placement , 2002, POPL '02.

[5] Yiran Chen,et al. Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[6] Wei-Che Tseng,et al. Towards energy efficient hybrid on-chip Scratch Pad Memory with non-volatile memory , 2011, 2011 Design, Automation & Test in Europe.

[7] Xiaoxia Wu,et al. Hybrid cache architecture with disparate memory technologies , 2009, ISCA '09.

[8] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[9] Chandra Krintz,et al. Cache-conscious data placement , 1998, ASPLOS VIII.

[10] Vijayalakshmi Srinivasan,et al. Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[11] Tao Li,et al. Exploring Phase Change Memory and 3D Die-Stacking for Power/Thermal Friendly, Fast and Durable Memory Architectures , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[12] Jun Yang,et al. A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[13] Dean M. Tullsen,et al. Compiler Techniques for Reducing Data Cache Miss Rate on a Multithreaded Architecture , 2008, HiPEAC.

[14] Onur Mutlu,et al. Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[15] Minming Li,et al. Power-Aware Variable Partitioning for DSPs With Hybrid PRAM and DRAM Main Memory , 2011, IEEE Transactions on Signal Processing.

[16] Yiran Chen,et al. A novel architecture of the 3D stacked MRAM L2 cache for CMPs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[17] Norman P. Jouppi,et al. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[18] Rami G. Melhem,et al. Compiler-assisted data distribution for chip multiprocessors , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[19] Jianhua Li,et al. STT-RAM based energy-efficiency hybrid cache for CMPs , 2011, 2011 IEEE/IFIP 19th International Conference on VLSI and System-on-Chip.

[20] James R. Larus,et al. Static branch frequency and program profile analysis , 1994, MICRO 27.