Program Optimization for Concurrent Multithreaded Architectures

This paper presents some compiler and program transformation techniques for concurrent multithreaded architectures, in particular the superthreaded architecture [9], which adopts a thread pipelining execution model that allows threads with data dependences and control dependences to be executed in parallel. In this paper, we identify several important program analysis and transformation techniques that allow the superthreaded architecture to exploit more parallelism in programs with less run-time overhead. We evaluate the performance of the superthreaded architecture and the effectiveness of the program transformation techniques by manually compiling several benchmark programs and running them through a trace-driven, cycle-by-cycle superthreaded processor simulator. The simulation results show that a superthreaded processor can achieve promising speedups for most of the benchmark programs with the proposed program transformation techniques applied.

[1]  Jenn-Yuan Tsai,et al.  The superthreaded architecture: thread pipelining with run-time data dependence checking and control speculation , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[2]  Pen-Chung Yew,et al.  Statement Re-ordering for DOACROSS Loops , 1994, ICPP.

[3]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[4]  Michael D. Smith,et al.  Tracing with Pixie , 1991 .

[5]  Gurindar S. Sohi,et al.  Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[6]  Lori Pollock,et al.  An experimental study of several cooperative register allocation and instruction scheduling strategies , 1995, MICRO 1995.

[7]  Kevin O'Brien,et al.  Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading , 1995, PACT.

[8]  Kozo Kimura,et al.  An Elementary Processor Architecture with Simultaneous Instruction Issuing from Multiple Threads , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[9]  Zhiyuan Li Array privatization for parallel execution of loops , 1992, ICS.

[10]  Gurindar S. Sohi,et al.  The expandable split window paradigm for exploiting fine-grain parallelsim , 1992, ISCA '92.