Compiler-assisted preferred caching for embedded systems with STT-RAM based hybrid cache

As technology scales down, energy consumption is becoming a big problem for traditional SRAM-based cache hierarchies. The emerging Spin-Torque Transfer RAM (STT-RAM) is a promising replacement for large on-chip cache due to its ultra low leakage power and high storage density. However, write operations on STT-RAM suffer from considerably higher energy consumption and longer latency than SRAM. Hybrid cache consisting of both SRAM and STT-RAM has been proposed recently for both performance and energy efficiency. Most management strategies for hybrid caches employ migration-based techniques to dynamically move write-intensive data from STT-RAM to SRAM. These techniques lead to extra overheads. In this paper, we propose a compiler-assisted approach, preferred caching, to significantly reduce the migration overhead by giving migration-intensive memory blocks the preference for the SRAM part of the hybrid cache. Furthermore, a data assignment technique is proposed to improve the efficiency of preferred caching. The reduction of migration overhead can in turn improve the performance and energy efficiency of STT-RAM based hybrid cache. The experimental results show that, with the proposed techniques, on average, the number of migrations is reduced by 21.3%, the total latency is reduced by 8.0% and the total dynamic energy is reduced by 10.8%.

[1]  Amin Jadidi,et al.  High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[2]  Trevor Mudge,et al.  MiBench: A free, commercially representative embedded benchmark suite , 2001 .

[3]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[4]  Dror Rawitz,et al.  The hardness of cache conscious data placement , 2002, POPL '02.

[5]  Yiran Chen,et al.  Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[6]  Wei-Che Tseng,et al.  Towards energy efficient hybrid on-chip Scratch Pad Memory with non-volatile memory , 2011, 2011 Design, Automation & Test in Europe.

[7]  Xiaoxia Wu,et al.  Hybrid cache architecture with disparate memory technologies , 2009, ISCA '09.

[8]  Vikram S. Adve,et al.  LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..

[9]  Chandra Krintz,et al.  Cache-conscious data placement , 1998, ASPLOS VIII.

[10]  Vijayalakshmi Srinivasan,et al.  Scalable high performance main memory system using phase-change memory technology , 2009, ISCA '09.

[11]  Tao Li,et al.  Exploring Phase Change Memory and 3D Die-Stacking for Power/Thermal Friendly, Fast and Durable Memory Architectures , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[12]  Jun Yang,et al.  A durable and energy efficient main memory using phase change memory technology , 2009, ISCA '09.

[13]  Dean M. Tullsen,et al.  Compiler Techniques for Reducing Data Cache Miss Rate on a Multithreaded Architecture , 2008, HiPEAC.

[14]  Onur Mutlu,et al.  Architecting phase change memory as a scalable dram alternative , 2009, ISCA '09.

[15]  Minming Li,et al.  Power-Aware Variable Partitioning for DSPs With Hybrid PRAM and DRAM Main Memory , 2011, IEEE Transactions on Signal Processing.

[16]  Yiran Chen,et al.  A novel architecture of the 3D stacked MRAM L2 cache for CMPs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[17]  Norman P. Jouppi,et al.  Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).

[18]  Rami G. Melhem,et al.  Compiler-assisted data distribution for chip multiprocessors , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).

[19]  Jianhua Li,et al.  STT-RAM based energy-efficiency hybrid cache for CMPs , 2011, 2011 IEEE/IFIP 19th International Conference on VLSI and System-on-Chip.

[20]  James R. Larus,et al.  Static branch frequency and program profile analysis , 1994, MICRO 27.