Comparing Execution Performance of Scheduled Dataflow With RISC Processors

In this paper we describe a new approach to designing multithreaded architecture that can be used as the basic building blocks in high-end computing architectures. Our architecture uses non-blocking multithreaded model based on dataflow paradigm. In addition, all memory accesses are decoupled from the thread execution. Data is pre-loaded into the thread context (registers), and all results are post-stored after the completion of the thread execution. The decoupling of memory accesses from thread execution requires a separate unit to perform the necessary pre-loads and post-stores, and to control the allocation of hardware thread contexts to the enabled threads. The non-blocking nature of threads reduces the number of context switches, thus reducing the overhead in scheduling threads. Our functional execution paradigm eliminates complex hardware required for dynamic scheduling of instructions used in modern superscalar architectures. We will present our preliminary results obtained from an instruction set simulator using several benchmark programs. We compare the execution of our architecture with that of MIPS architecture as facilitated by DLX simulator.

[1]  Josep Torrellas,et al.  A Chip-Multiprocessor Architecture with Speculative Multithreading , 1999, IEEE Trans. Computers.

[2]  Jian Huang,et al.  The Superthreaded Processor Architecture , 1999, IEEE Trans. Computers.

[3]  Krishna M. Kavi,et al.  A decoupled scheduled dataflow multithreaded architecture , 1999, Proceedings Fourth International Symposium on Parallel Architectures, Algorithms, and Networks (I-SPAN'99).

[4]  Guang R. Gao,et al.  A design study of the EARTH multiprocessor , 1995, PACT.

[5]  R. Govindarajan,et al.  Design and performance evaluation of a multithreaded architecture , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[6]  Mitsuhisa Sato,et al.  Super-threading: architectural and software mechanisms for optimizing parallel computation , 1993, ICS '93.

[7]  Monica S. Lam,et al.  Limits of Control Flow on Parallelism , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[8]  Y. Patt,et al.  Single instruction stream parallelism is greater than two , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[9]  David W. Wall,et al.  Limits of instruction-level parallelism , 1991, ASPLOS IV.

[10]  David E. Culler,et al.  Monsoon: an explicit token-store architecture , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[11]  James E. Smith,et al.  Decoupled access/execute computer architectures , 1984, TOCS.

[12]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .