论文信息 - An Equal Area Comparison of Embedded DRAM and SRAM Memory Architectures for a Chip Multiprocessor

An Equal Area Comparison of Embedded DRAM and SRAM Memory Architectures for a Chip Multiprocessor

Recent architectures in academia and industry have explored placing multiple processors on a single chip, but a consensus has not emerged on the memory architecture. The recent availability of embedded DRAM (EDRAM) has further complicated the formula. In this investigation, we present a new and comprehensive comparison of four very different memory technologies in the same framework: SRAM cache, SRAM configured as pageable memory, EDRAM configured as cache, and EDRAM configured as pageable memory. In addition, these experiments investigate tradeoffs between two levels of on-chip memory, given constant silicon area: as the level one capacity increases, the level two capacity decreases. Having four processors on a single die, each with its own set of level one caches, helps exaggerate the effective memory tradeoffs.

Stephen Richardson | Stuart Siu | Paul Keltcher

[1] Kunle Olukotun,et al. Improving the performance of speculatively parallel applications on the Hydra CMP , 1999 .

[2] Kunle Olukotun,et al. The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[3] Michael J. Flynn,et al. An area model for on-chip memories and its application , 1991 .

[4] Josep Torrellas,et al. A Chip-Multiprocessor Architecture with Speculative Multithreading , 1999, IEEE Trans. Computers.

[5] Kunle Olukotun,et al. A Single Chip Multiprocessor Integrated with DRAM , 1997 .

[6] Kunle Olukotun,et al. Designing High Bandwidth On-Chip Caches , 1997, ISCA.

[7] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.

[8] Rajesh Raman,et al. Matchmaking: distributed resource management for high throughput computing , 1998, Proceedings. The Seventh International Symposium on High Performance Distributed Computing (Cat. No.98TB100244).

[9] Kunle Olukotun,et al. A Single-Chip Multiprocessor , 1997, Computer.

[10] Kazuaki Murakami,et al. Dynamically variable line-size cache exploiting high on-chip memory bandwidth of merged DRAM/logic LSIs , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[11] Anoop Gupta,et al. SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[12] Kazuaki Murakami,et al. PPRAM (Parallel Processing RAM): A Merged-DRAM/Logic System-LSI Architecture , 1997 .

[13] Wen-mei W. Hwu,et al. Run-time spatial locality detection and optimization , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[14] Philip Machanick,et al. Hardware-software trade-offs in a direct Rambus implementation of the RAMpage memory hierarchy , 1998, ASPLOS VIII.

[15] V.G. Oklobdzija,et al. Determination of optimal sizes for a first and second level SRAM-DRAM on-chip cache combination , 1994, Proceedings 1994 IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[16] Keith Diefendorff,et al. Power4 focuses on memory bandwidth , 1999 .

[17] Kunle Olukotun,et al. Considerations in the Design of Hydra: A Multiprocessor-on-a-Chip Microarchitecture , 1998 .

[18] Kunle Olukotun,et al. Exploring the design space for a shared-cache multiprocessor , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[19] Richard Crisp,et al. Direct RAMbus technology: the new main memory standard , 1997, IEEE Micro.