PIM Lite: On the Road Towards Relentless Multi-threading in Massively Parallel Systems

Processing In Memory (PIM) technology (mixing signiflcant processing logic with dense memory on the same chip) has become a popular new emerging trend in recent years. In many cases, however, it has been used simply as a step towards a \system on a chip." This paper assumes that PIM systems will be inherently massively parallel, with many chips collaborating in a computation, perhaps in concert with more conventional microprocessors. While such systems could be designed to support \classical" parallel models such as DSM or message passing, this paper discusses several difierent models born from the HTMT project. All of these models involved signiflcant multi-threading, with large numbers of relatively light weight threads executing within the PIM nodes. To take advantage of these characteristics, we have designed a new ISA and matching microarchitecture that supports such multithreading in ways that leverage very e‐ciently the enhanced local bandwidth and access time capable from an on chip memory macro. A simplifled version of this, termed PIM Lite, is about to go to fab as a memory part with multiple internal nodes, all of which support very light weight threads in a simple SMT microarchitecture. This paper will discuss PIM Lite, and then our outlook on what more advanced designs might look like.

[1]  B J Smith,et al.  A pipelined, shared resource MIMD computer , 1986 .

[2]  David E. Culler,et al.  Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine , 1991, ASPLOS IV.

[3]  Jaewook Shin,et al.  Mapping Irregular Applications to DIVA, a PIM-based Data-Intensive Architecture , 1999, ACM/IEEE SC 1999 Conference (SC'99).

[4]  Noah Treuhaft,et al.  Scalable Processors in the Billion-Transistor Era: IRAM , 1997, Computer.

[5]  R. S. Nikhil Can dataflow subsume von Neumann computing? , 1989, ISCA '89.

[6]  Peter M. Kogge,et al.  The Characterization of Data Intensive Memory Workloads on Distributed PIM Systems , 2000, Intelligent Memory Systems.

[7]  Peter M. Kogge,et al.  EXECUBE-A New Architecture for Scaleable MPPs , 1994, 1994 International Conference on Parallel Processing Vol. 1.

[8]  Thomas L. Sterling,et al.  Microservers: a new memory semantics for massively parallel computing , 1999, ICS '99.

[9]  G. Jack Lipovski,et al.  The dynamic associative access memory chip and its application to SIMD processing and full-text database retrieval , 1999, Records of the 1999 IEEE International Workshop on Memory Technology, Design and Testing.

[10]  W. Daniel Hillis,et al.  The CM-5 Connection Machine: a scalable supercomputer , 1993, CACM.

[11]  Arvind,et al.  Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1990, IEEE Trans. Computers.

[12]  David E. Culler,et al.  Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine , 1991, ASPLOS IV.

[13]  Steven R. Kunkel,et al.  A multithreaded PowerPC processor for commercial servers , 2000, IBM J. Res. Dev..

[14]  Maya Gokhale,et al.  Processing in Memory: The Terasys Massively Parallel PIM Array , 1995, Computer.

[15]  David E. Culler,et al.  Monsoon: an explicit token-store architecture , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[16]  Larry A. Bergman,et al.  A design analysis of a hybrid technology multithreaded architecture for petaflops scale computation3 , 1999, ICS '99.

[17]  Henry S. Warren,et al.  Blue Gene , 2000, ISHPC.

[18]  William J. Dally,et al.  The J-machine Multicomputer: An Architectural Evaluation , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[19]  Seth Copen Goldstein,et al.  Active Messages: A Mechanism for Integrated Communication and Computation , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[20]  Balaram Sinharoy,et al.  POWER4 system microarchitecture , 2002, IBM J. Res. Dev..

[21]  Andrew A. Chien,et al.  The Message Driven Processor: an integrated multicomputer processing element , 1992, Proceedings 1992 IEEE International Conference on Computer Design: VLSI in Computers & Processors.