An elementary processor architecture with simultaneous instruction issuing from multiple threads

In this paper, we propose a multithreaded processor architecture which improves machine throughput. In our processor architecture, instructions from different threads (not a single thread) are issued simultaneously to multiple functional units, and these instructions can begin execution unless there are functional unit conflicts. This parallel execution scheme greatly improves the utilization of the functional unit. Simulation results show that by executing two and four threads in parallel on a nine-functional-unit processor, a 2.02 and a 3.72 times speed-up, respectively, can be achieved over a conventional RISC processor. Our architecture is also applicable to the efficient execution of a single loop. In order to control functional unit conflicts between loop iterations, we have developed a new static code scheduling technique. Another loop execution scheme, by using the multiple control flow mechanism of our architecture, makes it possible to parallelize loops which are difficult to parallelize in vector or VLIW machines.

[1]  Burton J. Smith,et al.  A processor architecture for Horizon , 1988, Proceedings. SUPERCOMPUTING '88.

[2]  B J Smith,et al.  A pipelined, shared resource MIMD computer , 1986 .

[3]  Edward G. Coffman,et al.  Computer and job-shop scheduling theory , 1976 .

[4]  Anant Agarwal,et al.  APRIL: a processor architecture for multiprocessing , 1990, ISCA '90.

[5]  R. M. Tomasulo,et al.  An efficient algorithm for exploiting multiple arithmetic units , 1995 .

[6]  N. Irie,et al.  SIMP (Single Instruction stream/Multiple instruction Pipelining): a novel high-speed single-processor architecture , 1989, ISCA '89.

[7]  Hwa C. Torng,et al.  The Concurrent Execution of Multiple Instruction Streams on Superscalar Processors , 1991, ICPP.

[8]  Barbara B. Simons,et al.  Scheduling Sequential Loops on Parallel Processors , 1987, ICS.

[9]  Michael D. Smith,et al.  Boosting beyond static scheduling in a superscalar processor , 1990, ISCA '90.

[10]  Roy F. Touzeau A Fortran compiler for the FPS-164 scientific computer , 1984, SIGPLAN '84.

[11]  Andrew R. Pleszkun,et al.  Strategies for achieving improved processor throughput , 1991, ISCA '91.

[12]  Chuan-lin Wu,et al.  A Benchmark Evaluation of a Multi-threaded RISC Processor Architecture , 1991, ICPP.

[13]  Anoop Gupta,et al.  Exploring The Benefits Of Multiple Hardware Contexts In A Multiprocessor Architecture: Preliminary Results , 1989, The 16th Annual International Symposium on Computer Architecture.

[14]  R. S. Nikhil Can dataflow subsume von Neumann computing? , 1989, ISCA '89.

[15]  Robert H. Halstead,et al.  MASA: a multithreaded processor architecture for parallel symbolic computing , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[16]  A. Gupta,et al.  Exploring the benefits of multiple hardware contexts in a multiprocessor architecture: preliminary results , 1989, ISCA '89.

[17]  Robert A. Iannucci Toward a dataflow/von Neumann hybrid architecture , 1988, ISCA '88.