An Equal Area Comparison of Embedded DRAM and SRAM Memory Architectures for a Chip Multiprocessor

Recent architectures in academia and industry have explored placing multiple processors on a single chip, but a consensus has not emerged on the memory architecture. The recent availability of embedded DRAM (EDRAM) has further complicated the formula. In this investigation, we present a new and comprehensive comparison of four very different memory technologies in the same framework: SRAM cache, SRAM configured as pageable memory, EDRAM configured as cache, and EDRAM configured as pageable memory. In addition, these experiments investigate tradeoffs between two levels of on-chip memory, given constant silicon area: as the level one capacity increases, the level two capacity decreases. Having four processors on a single die, each with its own set of level one caches, helps exaggerate the effective memory tradeoffs.

[1]  Kunle Olukotun,et al.  Improving the performance of speculatively parallel applications on the Hydra CMP , 1999 .

[2]  Kunle Olukotun,et al.  The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[3]  Michael J. Flynn,et al.  An area model for on-chip memories and its application , 1991 .

[4]  Josep Torrellas,et al.  A Chip-Multiprocessor Architecture with Speculative Multithreading , 1999, IEEE Trans. Computers.

[5]  Kunle Olukotun,et al.  A Single Chip Multiprocessor Integrated with DRAM , 1997 .

[6]  Kunle Olukotun,et al.  Designing High Bandwidth On-Chip Caches , 1997, ISCA.

[7]  Anoop Gupta,et al.  The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[8]  Rajesh Raman,et al.  Matchmaking: distributed resource management for high throughput computing , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[9]  Kunle Olukotun,et al.  A Single-Chip Multiprocessor , 1997, Computer.

[10]  Kazuaki Murakami,et al.  Dynamically variable line-size cache exploiting high on-chip memory bandwidth of merged DRAM/logic LSIs , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[11]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[12]  Kazuaki Murakami,et al.  PPRAM (Parallel Processing RAM): A Merged-DRAM/Logic System-LSI Architecture , 1997 .

[13]  Wen-mei W. Hwu,et al.  Run-time spatial locality detection and optimization , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[14]  Philip Machanick,et al.  Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy , 1998, ASPLOS VIII.

[15]  V.G. Oklobdzija,et al.  Determination of optimal sizes for a first and second level SRAM-DRAM on-chip cache combination , 1994, Proceedings 1994 IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[16]  Keith Diefendorff,et al.  Power4 focuses on memory bandwidth , 1999 .

[17]  Kunle Olukotun,et al.  Considerations in the Design of Hydra: A Multiprocessor-on-a-Chip Microarchitecture , 1998 .

[18]  Kunle Olukotun,et al.  Exploring the design space for a shared-cache multiprocessor , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[19]  Richard Crisp,et al.  Direct RAMbus technology: the new main memory standard , 1997, IEEE Micro.