A Prototype Processing-In-Memory (PIM) Chip for the Data-Intensive Architecture (DIVA) System

The Data-Intensive Architecture (DIVA) system employs Processing-In-Memory (PIM) chips as smart-memory coprocessors. This architecture exploits inherent memory bandwidth both on chip and across the system to target several classes of bandwidth-limited applications, including multimedia applications and pointer-based and sparse-matrix computations. The DIVA project has built a prototype development system using PIM chips in place of standard DRAMs to demonstrate these concepts. We have recently ported several demonstration kernels to this platform and have exhibited a speedup of 35X on a matrix transpose operation.This paper focuses on the 32-bit scalar and 256-bit WideWord integer processing components of the first DIVA prototype PIM chip, which was fabricated in TSMC 0.18 μm technology. In conjunction with other publications, this paper demonstrates that impressive gains can be achieved with very little “smart” logic added to memory devices. A second PIM prototype that includes WideWord floating-point capability is scheduled to tape out in August 2003.

[1]  Thomas L. Sterling An Introduction to the Gilgamesh PIM Architecture , 2001, Euro-Par.

[2]  Chang Woo Kang,et al.  Implementation of a 256-bit wideword processor for the data-intensive architecture (DIVA) processing-in-memory (PIM) chip , 2002, Proceedings of the 28th European Solid-State Circuits Conference.

[3]  Ruby B. Lee Subword parallelism with MAX-2 , 1996, IEEE Micro.

[4]  Chun Chen,et al.  The architecture of the DIVA processing-in-memory chip , 2002, ICS '02.

[5]  Csaba Andras Moritz,et al.  Parallelizing applications into silicon , 1999, Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00375).

[6]  Maya Gokhale,et al.  Processing in Memory: The Terasys Massively Parallel PIM Array , 1995, Computer.

[7]  Norman P. Jouppi,et al.  Performance of image and video processing with general-purpose processors and media ISA extensions , 1999, ISCA.

[8]  Saman P. Amarasinghe,et al.  Exploiting superword level parallelism with multimedia instruction sets , 2000, PLDI '00.

[9]  James R. Goodman,et al.  Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[10]  Seth Copen Goldstein,et al.  Active messages: a mechanism for integrating communication and computation , 1998, ISCA '98.

[11]  Erik Brunvand,et al.  Impulse: building a smarter memory controller , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[12]  Jaewook Shin,et al.  Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[13]  Seung-Moon Yoo,et al.  FlexRAM: toward an advanced intelligent memory system , 1999, Proceedings 1999 IEEE International Conference on Computer Design: VLSI in Computers and Processors (Cat. No.99CB37040).

[14]  Frederic T. Chong,et al.  Active pages: a computation model for intelligent memory , 1998, ISCA.

[15]  Mary W. Hall,et al.  Memory Management in a PIM-Based Architecture , 2000, Intelligent Memory Systems.

[16]  Chang Woo Kang,et al.  A fast, simple router for the Data-Intensive Architecture (DIVA) system , 2000, Proceedings of the 43rd IEEE Midwest Symposium on Circuits and Systems (Cat.No.CH37144).

[17]  Subramanian S. Iyer,et al.  Embedded DRAM technology: opportunities and challenges , 1999 .

[18]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[19]  David A. Patterson,et al.  Computer architecture (2nd ed.): a quantitative approach , 1996 .

[20]  Jeffrey T. Draper,et al.  Implementation of a 32-bit RISC processor for the data-intensive architecture processing-in-memory chip , 2002, Proceedings IEEE International Conference on Application- Specific Systems, Architectures, and Processors.

[21]  Fong Pong,et al.  Missing the Memory Wall: The Case for Processor/Memory Integration , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[22]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[23]  Duncan G. Elliott,et al.  Computational RAM: Implementing Processors in Memory , 1999, IEEE Des. Test Comput..

[24]  Thomas L. Sterling,et al.  Microservers: a new memory semantics for massively parallel computing , 1999, ICS '99.

[25]  D. Burger,et al.  Memory Bandwidth Limitations of Future Microprocessors , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[26]  Samuel Williams,et al.  Hardware/compiler codevelopment for an embedded media processor , 2001, Proc. IEEE.

[27]  Henry S. Warren,et al.  Blue Gene , 2000, ISHPC.