The superthreaded architecture: thread pipelining with run-time data dependence checking and control speculation

This peeper presents a new concurrent multiple-threaded architectural model, called superthreading, for exploiting thread-level parallelism on a processor. This architectural model adopts a thread pipelining execution model that allows threads with data dependences and control dependences to be executed in parallel. The basic idea of thread pipelining is to compute and forward recurrence data and possible dependent store addresses to the next thread as soon as possible, so the next thread can start execution and perform run-time data dependence checking. Thread pipelining also forces contiguous threads to perform their memory write-backs in order, which enables the compiler to fork threads with control speculation. With run-time support for data dependence checking and control speculation, the superthreaded architectural model can exploit loop-level parallelism from a broad range of applications.

[1]  Chuan-lin Wu,et al.  A Benchmark Evaluation of a Multi-threaded RISC Processor Architecture , 1991, ICPP.

[2]  Y. Patt,et al.  Single instruction stream parallelism is greater than two , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[3]  Gurindar S. Sohi,et al.  The Expandable Split Window Paradigm for Exploiting Fine-grain Parallelism , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[4]  Gurindar S. Sohi,et al.  Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[5]  Kozo Kimura,et al.  An elementary processor architecture with simultaneous instruction issuing from multiple threads , 1992, ISCA '92.

[6]  ScalesHunter,et al.  Single instruction stream parallelism is greater than two , 1991 .

[7]  Hwa C. Torng,et al.  The Concurrent Execution of Multiple Instruction Streams on Superscalar Processors , 1991, ICPP.

[8]  Harry F. Jordan Performance measurements on HEP - a pipelined MIMD computer , 1983, ISCA '83.

[9]  Gurindar S. Sohi,et al.  The expandable split window paradigm for exploiting fine-grain parallelsim , 1992, ISCA '92.

[10]  Andrew Wolfe,et al.  A variable instruction stream extension to the VLIW architecture , 1991, ASPLOS IV.

[11]  Lori Pollock,et al.  An experimental study of several cooperative register allocation and instruction scheduling strategies , 1995, MICRO 1995.

[12]  Mike Johnson,et al.  Superscalar microprocessor design , 1991, Prentice Hall series in innovative technology.

[13]  William J. Dally,et al.  Processor coupling: integrating compile time and runtime scheduling for parallelism , 1992, ISCA '92.

[14]  Kevin O'Brien,et al.  Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading , 1995, PACT.

[15]  Monica S. Lam,et al.  Limits of control flow on parallelism , 1992, ISCA '92.

[16]  Krishna Subramanian,et al.  Enhanced modulo scheduling for loops with conditional branches , 1992, MICRO 1992.

[17]  Grant E. Haab,et al.  Enhanced Modulo Scheduling For Loops With Conditional Branches , 1992, [1992] Proceedings the 25th Annual International Symposium on Microarchitecture MICRO 25.

[18]  Andrew R. Pleszkun,et al.  Strategies for achieving improved processor throughput , 1991, ISCA '91.

[19]  Allan Porterfield,et al.  The Tera computer system , 1990 .

[20]  William J. Dally,et al.  The M-machine multicomputer , 1997, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[21]  Robert H. Halstead,et al.  MASA: a multithreaded processor architecture for parallel symbolic computing , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[22]  David W. Wall,et al.  Limits of instruction-level parallelism , 1991, ASPLOS IV.

[23]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.