Speculative Thread Execution in a Multithreaded Dataflow Architecture

Instruction Level Parallelism (ILP) in modern Superscalar and VLIW processors is achieved using out-of-order execution, branch predictions, value predictions, and speculative executions of instructions. These techniques are not scalable. This has led to multithreading and multi-core systems. However, such processors require compilers to automatically extract thread level or task level parallelism. Loop carried dependencies and aliases caused by complex array subscripts and pointer data types limit compilers’ ability to parallelize code. Hardware support for threadlevel speculation (TLS) allows compilers to more aggressively parallelize programs using speculative thread execution, since hardware will enforce correct order of execution. In this paper, we show how thread-level speculation can be implemented within the context of our Scheduled Dataflow architecture and provide preliminary performance analysis.

[1]  Krishna M. Kavi,et al.  Scheduled dataflow architecture : A synchronous execution paradigm for dataflow , 1999 .

[2]  Krishna M. Kavi,et al.  Parallelization of DOALL and DOACROSS Loops - A Survey , 1997, Adv. Comput..

[3]  Antonia Zhai,et al.  A scalable approach to thread-level speculation , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[4]  Josep Torrellas,et al.  Hardware for speculative parallelization of partially-parallel loops in DSM multiprocessors , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[5]  Gurindar S. Sohi,et al.  Speculative Multithreaded Processors , 2001, Computer.

[6]  Vikas Agarwal,et al.  Clock rate versus IPC: the end of the road for conventional microarchitectures , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[7]  Mikko H. Lipasti,et al.  On the value locality of store instructions , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).

[8]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[9]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[10]  Gurindar S. Sohi,et al.  ARB: A Hardware Mechanism for Dynamic Reordering of Memory References , 1996, IEEE Trans. Computers.

[11]  David A. Patterson,et al.  Computer Architecture - A Quantitative Approach, 5th Edition , 1996 .

[12]  Krishna M. Kavi,et al.  Scheduled Dataflow: Execution Paradigm, Architecture, and Performance Evaluation , 2001, IEEE Trans. Computers.