A decoupled scheduled dataflow multithreaded architecture

Proposes a new approach to building multithreaded uniprocessors that become building blocks in high-end computing architectures. Our innovativeness stems from a multithreaded architecture with non-blocking threads where all memory accesses are decoupled from the thread execution. Data is pre-loaded into the thread context (registers), and all results are post-stored after the completion of the thread execution. The decoupling of memory accesses from thread execution requires a separate unit to perform the necessary pre-loads and post-stores, and to control the allocation of hardware thread contexts to enabled threads. This separation facilitates achieving high locality and minimizing the impact of distribution and hierarchy in large memory systems. The non-blocking nature of threads eliminates the need for thread switching, thus improving the overhead in scheduling threads. The functional execution paradigm eliminates complex hardware required for scheduling instructions for modern superscalar architectures. We present preliminary results obtained from Monte Carlo simulations of the proposed architectural features.

[1]  A.R. Hurson,et al.  Cache memories in dataflow architecture , 1995, Proceedings.Seventh IEEE Symposium on Parallel and Distributed Processing.

[2]  V. Cuppu,et al.  A performance comparison of contemporary DRAM architectures , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).

[3]  Ricardo Bianchini,et al.  The MIT Alewife machine: architecture and performance , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[4]  Theo Ungerer,et al.  A multithreaded processor designed for distributed shared memory systems , 1997, Proceedings. Advances in Parallel and Distributed Computing.

[5]  Kenneth R. Traub,et al.  Multithreading: a revisionist view of dataflow architectures , 1991, ISCA '91.

[6]  Monica S. Lam,et al.  Limits of control flow on parallelism , 1992, ISCA '92.

[7]  Sharilyn A. Thoreson,et al.  A Feasibility Study of a Memory Hierarchy in a Data Flow Environment , 1985, ICPP.

[8]  Guang R. Gao,et al.  A design study of the EARTH multiprocessor , 1995, PACT.

[9]  David E. Culler,et al.  Monsoon: an explicit token-store architecture , 1998, ISCA '98.

[10]  R. Govindarajan,et al.  Design and performance evaluation of a multithreaded architecture , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[11]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[12]  Michael Shebanow,et al.  Single instruction stream parallelism is greater than two , 1991, ISCA '91.

[13]  Masaru Takesue A unified resource management and execution control mechanism for data flow machines , 1987, ISCA '87.

[14]  Krishna M. Kavi,et al.  A Non-Blocking Multithreaded Architecture , 1997 .

[15]  Fong Pong,et al.  Missing the Memory Wall: The Case for Processor/Memory Integration , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[16]  Mitsuhisa Sato,et al.  Super-threading: architectural and software mechanisms for optimizing parallel computation , 1993, ICS '93.

[17]  Trevor N. Mudge,et al.  A performance comparison of contemporary DRAM architectures , 1999, ISCA.

[18]  David E. Culler,et al.  Multithreading: Fundamental Limits, Potential Gains, and Alternatives , 1994, Multithreaded Computer Architecture.

[19]  David W. Wall,et al.  Limits of instruction-level parallelism , 1991, ASPLOS IV.

[20]  Anant Agarwal,et al.  Performance Tradeoffs in Multithreaded Processors , 1992, IEEE Trans. Parallel Distributed Syst..

[21]  Krishna M. Kavi,et al.  Design of cache memories for dataflow architecture , 1998, J. Syst. Archit..

[22]  Seth Copen Goldstein,et al.  TAM - A Compiler Controlled Threaded Abstract Machine , 1993, J. Parallel Distributed Comput..

[23]  James C. Hoe,et al.  START-NG: Delivering Seamless Parallel Computing , 1995, Euro-Par.

[24]  Mario Tokoro,et al.  On the working set concept for data-flow machines , 1983, ISCA '83.