A case for intelligent RAM

Two trends call into question the current practice of fabricating microprocessors and DRAMs as different chips on different fabrication lines. The gap between processor and DRAM speed is growing at 50% per year; and the size and organization of memory on a single DRAM chip is becoming awkward to use, yet size is growing at 60% per year. Intelligent RAM, or IRAM, merges processing and memory into a single chip to lower memory latency, increase memory bandwidth, and improve energy efficiency. It also allows more flexible selection of memory size and organization, and promises savings in board area. This article reviews the state of microprocessors and DRAMs today, explores some of the opportunities and challenges for IRAMs, and finally estimates performance and energy efficiency of three IRAM designs.

[1]  D.Aspinall The Microprocessor and its Application , 1978 .

[2]  Colin Whitby-Strevens The transputer , 1985, ISCA 1985.

[3]  Martina Zitterbart,et al.  A parallel implementation of XTP on transputers , 1991, [1991] Proceedings 16th Conference on Local Computer Networks.

[4]  Duncan G. Elliott,et al.  Computational Ram: A Memory-simd Hybrid And Its Application To Dsp , 1992, 1992 Proceedings of the IEEE Custom Integrated Circuits Conference.

[5]  Michael D. Noakes,et al.  The J-machine multicomputer: an architectural evaluation , 1993, ISCA '93.

[6]  Michael F. Deering,et al.  FBRAM: a new form of memory optimized for 3D graphics , 1994, SIGGRAPH.

[7]  Steven Przybylski,et al.  New DRAM Technologies: A Comprehensive Analysis of the New Architecture , 1994 .

[8]  William J. Dally,et al.  The M-machine multicomputer , 1997, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[9]  Lori Pollock,et al.  An experimental study of several cooperative register allocation and instruction scheduling strategies , 1995, MICRO 1995.

[10]  Peter M. Kogge,et al.  Combined DRAM and logic chip for massively parallel systems , 1995, Proceedings Sixteenth Conference on Advanced Research in VLSI.

[11]  Sally A. McKee,et al.  Hitting the memory wall: implications of the obvious , 1995, CARN.

[12]  Zarka Cvetanovic,et al.  Performance characterization of the Alpha 21164 microprocessor using TP and SPEC workloads , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[13]  D. Burger,et al.  Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[14]  Sharon E. Perl,et al.  Studies of Windows NT performance using dynamic execution traces , 1996, OSDI '96.

[15]  Y. Fujita,et al.  A 7.68 GIPS 3.84 GB/s 1W parallel image processing RAM integrating a 16 Mb DRAM and 128 processors , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.

[16]  R. Busch,et al.  A 1 MB, 100 MHz integrated L2 cache memory with 128b interface and ECC protection , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.

[17]  Fong Pong,et al.  Missing the Memory Wall: The Case for Processor/Memory Integration , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[18]  N. Okumura,et al.  A multimedia 32 b RISC microprocessor with 16 Mb DRAM , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.

[19]  David A. Patterson,et al.  Computer organization and design (2nd ed.): the hardware/software interface , 1997 .

[20]  K. Murakami,et al.  Parallel processing RAM chip with 256 Mb DRAM and quad processors , 1997, 1997 IEEE International Solids-State Circuits Conference. Digest of Technical Papers.

[21]  K. Yelick,et al.  The Energy Efficiency Of Iram Architectures , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[22]  K. Yelick,et al.  Intelligent RAM (IRAM): chips that remember and compute , 1997, 1997 IEEE International Solids-State Circuits Conference. Digest of Technical Papers.