Runtime 3-D stacked cache management for chip-multiprocessors

Three-dimensional (3-D) memory stacking is one of the most promising solutions to tackle memory bandwidth problems in chip multiprocessors. In this work, we propose an efficient runtime 3-D cache management technique which not only takes advantage of the low memory access latency through vertical interconnections, but also exploits runtime memory access demand of applications which varies dynamically with time. Experimental results show that the proposed method offers performance improvement by up to 26.7% and on average 13.1% compared with a configuration of private stacked cache.

[1]  Aamir Zia,et al.  A 3-D Cache With Ultra-Wide Data Bus for 3-D Processor-Memory Integration , 2010, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[2]  Masayuki Nakajima,et al.  A chip-stacked memory for on-chip SRAM-rich SoCs and processors , 2009, 2009 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[3]  Kaushik Roy,et al.  Gated-Vdd: a circuit technique to reduce leakage in deep-submicron cache memories , 2000, ISLPED '00.

[4]  G. Edward Suh,et al.  Dynamic Cache Partitioning for Simultaneous Multithreading Systems , 2004 .

[5]  Jun Yang,et al.  A low-radix and low-diameter 3D interconnection network design , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[6]  Huazhong Yang,et al.  Performance/Thermal-Aware Design of 3D-Stacked L2 Caches for CMPs , 2012, TODE.

[7]  Hyunjin Lee,et al.  TPTS: A Novel Framework for Very Fast Manycore Processor Architecture Simulation , 2008, 2008 37th International Conference on Parallel Processing.

[8]  Hsien-Hsin S. Lee,et al.  3D-MAPS: 3D Massively parallel processor with stacked memory , 2012, 2012 IEEE International Solid-State Circuits Conference.

[9]  Chong-Min Kyung,et al.  Design and management of 3D-stacked NUCA cache for chip multiprocessors , 2011, GLSVLSI '11.

[10]  G. Edward Suh,et al.  Dynamic Partitioning of Shared Cache Memory , 2004, The Journal of Supercomputing.

[11]  D. Dickey,et al.  Testing for unit roots in autoregressive-moving average models of unknown order , 1984 .

[12]  Dominique Houzet,et al.  3D multiprocessor with 3D NoC architecture based on Tezzaron technology , 2012, 2011 IEEE International 3D Systems Integration Conference (3DIC), 2011 IEEE International.

[13]  Partha Pratim Pande,et al.  Networks-on-Chip in a Three-Dimensional Environment: A Performance Evaluation , 2009, IEEE Transactions on Computers.

[14]  Jaehyuk Huh,et al.  A NUCA Substrate for Flexible CMP Cache Sharing , 2007, IEEE Transactions on Parallel and Distributed Systems.

[15]  Luca Benini,et al.  3D NoCs — Unifying inter & intra chip communication , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.

[16]  John Paul Shen,et al.  Mitigating Amdahl's law through EPI throttling , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[17]  Yale N. Patt,et al.  Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[18]  Shyamkumar Thoziyoor,et al.  1 CACTI 4 . 0 , 2006 .

[19]  Paul Marchal,et al.  An RDL-configurable 3D memory tier to replace on-chip SRAM , 2010, 2010 Design, Automation & Test in Europe Conference & Exhibition (DATE 2010).

[20]  Avesta Sasan,et al.  Multiple sleep mode leakage control for cache peripheral circuits in embedded processors , 2008, CASES '08.

[21]  Rajeev Balasubramonian,et al.  Optimizing communication and capacity in a 3D stacked reconfigurable cache hierarchy , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.