Exploring DRAM Last Level Cache for 3D Network-on-Chip Architecture

In this paper, we implement and analyze different Network-on-Chip (NoC) designs with Static Random Access Memory (SRAM) Last Level Cache (LLC) and Dynamic Random Access Memory (DRAM) LLC. Different 2D/3D NoCs with SRAM/DRAM are modeled based on state-of-the-art chips. The impact of integrating DRAM cache into a NoC platform is discussed. We explore the advantages and disadvantages of DRAM cache for NoC in terms of access latency, cache size, area and power consumption. We present benchmark results using a cycle accurate full system simulator based on realistic workloads. Experiments show that under different workloads, the average cache hit latencies in two DRAM based designs are increased by 12.53% (2D) and reduced by 27.97% (3D) respectively compared with the SRAM. It is also shown that the power consumption is a tradeoff consideration in improving the cache hit latency of DRAM LLC. Overall, the power consumption of 3D NoC design with DRAM LLC has reduced 25.78% compared with the SRAM design. Our analysis and experimental results provide a guideline to design efficient 3D NoCs with DRAM LLC.

[1]  Saurabh Dighe,et al.  An 80-Tile 1.28TFLOPS Network-on-Chip in 65nm CMOS , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[2]  Johannes G. Janzen Calculating Memory System Power for DDR SDRAM , 2001 .

[3]  Marc Tremblay,et al.  A Third-Generation 65nm 16-Core 32-Thread Plus 32-Scout-Thread CMT SPARC® Processor , 2008, 2008 IEEE International Solid-State Circuits Conference - Digest of Technical Papers.

[4]  Li Zhao,et al.  Exploring DRAM cache architectures for CMP server platforms , 2007, 2007 25th International Conference on Computer Design.

[5]  Max B Aron The single-chip cloud computer , 2010 .

[6]  Kurt Keutzer,et al.  Getting to the bottom of deep submicron , 1998, ICCAD '98.

[7]  Doug Burger,et al.  An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.

[8]  Yiran Chen,et al.  A novel architecture of the 3D stacked MRAM L2 cache for CMPs , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[9]  Alfonso Casas Martín Intel Core i7-980X Extreme Edition A 3,33 GHZ: primera CPU con 6 núcleos , 2010 .

[10]  Lei Jiang,et al.  Die Stacking (3D) Microarchitecture , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).

[11]  Gabriel H. Loh,et al.  Implementing caches in a 3D technology for high performance processors , 2005, 2005 International Conference on Computer Design.

[12]  Hannu Tenhunen,et al.  A study of 3D Network-on-Chip design for data parallel H.264 coding , 2009 .

[13]  Sharad Malik,et al.  Orion: a power-performance simulator for interconnection networks , 2002, MICRO.

[14]  Theodore R. Bashkow,et al.  A large scale, homogeneous, fully distributed parallel machine, I , 1977, ISCA '77.

[15]  Rajeev Balasubramonian,et al.  Optimizing communication and capacity in a 3D stacked reconfigurable cache hierarchy , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.

[16]  Kanad Ghose,et al.  Energy-efficient MESI cache coherence with pro-active snoop filtering for multicore microprocessors , 2008, Proceeding of the 13th international symposium on Low power electronics and design (ISLPED '08).

[17]  Kai Li,et al.  The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[18]  Kaustav Banerjee,et al.  A thermally-aware performance analysis of vertically integrated (3-D) processor-memory hierarchy , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[19]  Hannu Tenhunen,et al.  A study of 3D Network-on-Chip design for data parallel H.264 coding , 2009, 2009 NORCHIP.

[20]  Kai Li,et al.  PARSEC vs. SPLASH-2: A quantitative comparison of two multithreaded benchmark suites on Chip-Multiprocessors , 2008, 2008 IEEE International Symposium on Workload Characterization.

[21]  Luca Benini,et al.  Networks on Chips : A New SoC Paradigm , 2022 .

[22]  Shyamkumar Thoziyoor,et al.  CACTI 5 . 1 , 2008 .

[23]  Krisztián Flautner,et al.  PicoServer: using 3D stacking technology to enable a compact energy efficient chip multiprocessor , 2006, ASPLOS XII.

[24]  Hannu Tenhunen,et al.  An analysis of designing 2D/3D chip multiprocessor wit different cache architecture , 2010, NORCHIP 2010.

[25]  Gabriel H. Loh,et al.  3D-Stacked Memory Architectures for Multi-core Processors , 2008, 2008 International Symposium on Computer Architecture.

[26]  R. Schaller,et al.  Technological innovation in the semiconductor industry: A case study of the International Technology Roadmap for Semiconductors (ITRS) , 2001, PICMET '01. Portland International Conference on Management of Engineering and Technology. Proceedings Vol.1: Book of Summaries (IEEE Cat. No.01CH37199).

[27]  Fredrik Larsson,et al.  Simics: A Full System Simulation Platform , 2002, Computer.

[28]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.