Energy efficient 3D Hybrid processor-memory architecture for the dark silicon age

With increasing the number of cores on the Chip-Multiprocessors (CMPs) as a result of continuous technology scaling, more cache resources are needed to feed all the cores. Hence, in order to improve performance by reducing off-chip memory access, inevitably on-chip caches should be increased. In on-chip cache hierarchy, last level cache (LLCs) is the largest one consuming more energy compared with the other levels in many-core CMPs as leakage power within the LLC has become a significant contributor in the overall chip power budget in deep sub-micron as well as dark silicon era. In this paper, we focus on exploiting Non-Volatile Memory (NVM) which is a new type of memory with promising features in shared distributed LLCs to decrease the leakage power consumption and mitigating the dark silicon phenomenon. In our proposed strategy, we first calculate Average Memory Access Time (AMAT) of running applications on the CMP in each predetermined interval by collected systems memory traffic. Based on the monitored AMATs, we then adaptively reconfigure Hybrid distributed LLC by selecting the proper memory type (i.e., SRAM bank or STT-RAM bank) at runtime. Experiment results on the PARSEC benchmarks show that the proposed method provides up to 55.22% (on average 39.3%) energy reduction and 35.33% on average energy-delay product (EDP) improvement with only 6% performance degradation compared to the conventional methods where single cache technology is used.

[1]  Somayeh Sardashti,et al.  The gem5 simulator , 2011, CARN.

[2]  Xiaoxia Wu,et al.  Hybrid cache architecture with disparate memory technologies , 2009, ISCA '09.

[3]  Yiran Chen,et al.  Circuit and microarchitecture evaluation of 3D stacking magnetic RAM (MRAM) as a universal memory replacement , 2008, 2008 45th ACM/IEEE Design Automation Conference.

[4]  David H. Albonesi,et al.  Selective cache ways: on-demand cache resource allocation , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[5]  M. Hosomi,et al.  A novel nonvolatile memory with spin torque transfer magnetization switching: spin-ram , 2005, IEEE InternationalElectron Devices Meeting, 2005. IEDM Technical Digest..

[6]  S. Ikeda,et al.  2 Mb SPRAM (SPin-Transfer Torque RAM) With Bit-by-Bit Bi-Directional Current Write and Parallelizing-Direction Current Read , 2008, IEEE Journal of Solid-State Circuits.

[7]  Xiaoxia Wu,et al.  Design exploration of hybrid caches with disparate memory technologies , 2010, TACO.

[8]  Yiran Chen,et al.  A novel architecture of the 3D stacked MRAM L2 cache for CMPs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[9]  Cong Xu,et al.  Bandwidth-aware reconfigurable cache design with hybrid memory technologies , 2011, 2011 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[10]  Karthikeyan Sankaralingam,et al.  Dark Silicon and the End of Multicore Scaling , 2012, IEEE Micro.

[11]  Yuan Xie,et al.  Design space exploration for 3D architectures , 2006, JETC.

[12]  Norman P. Jouppi,et al.  CACTI 6.0: A Tool to Model Large Caches , 2009 .

[13]  Frank Vahid,et al.  A highly configurable cache for low energy embedded systems , 2005, TECS.

[14]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[15]  Cong Xu,et al.  NVSim: A Circuit-Level Performance, Energy, and Area Model for Emerging Nonvolatile Memory , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[16]  Mahmut T. Kandemir,et al.  Steep-Slope Devices: From Dark to Dim Silicon , 2013, IEEE Micro.

[17]  Norman P. Jouppi,et al.  Reconfigurable caches and their application to media processing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[18]  Nanning Zheng,et al.  Using Magnetic RAM to Build Low-Power and Soft Error-Resilient L1 Cache , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[19]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[20]  Luca Benini,et al.  Design space exploration for 3D-stacked DRAMs , 2011, 2011 Design, Automation & Test in Europe.

[21]  Amin Jadidi,et al.  High-endurance and performance-efficient design of hybrid cache architectures through adaptive line replacement , 2011, IEEE/ACM International Symposium on Low Power Electronics and Design.

[22]  Xiaoxia Wu,et al.  Power and performance of read-write aware Hybrid Caches with non-volatile memories , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.

[23]  Jason Cong,et al.  Dynamically reconfigurable hybrid cache: An energy-efficient last-level cache design , 2012, 2012 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[24]  Ravi Kannan Algorithms: Recent Highlights and Challenges , 2011, SIGARCH Comput. Archit. News.

[25]  Rajeev Balasubramonian,et al.  Optimizing communication and capacity in a 3D stacked reconfigurable cache hierarchy , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[26]  Yuan Xie,et al.  Modeling, Architecture, and Applications for Emerging Memory Technologies , 2011, IEEE Design & Test of Computers.

[27]  Jung Ho Ahn,et al.  McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).