A feasibility study of hierarchical multithreading

Many studies have shown that significant amounts of parallelism exist at different granularities. Execution models such as superscalar and VLIW exploit parallelism from a single thread. Multithreaded processors make a step towards exploiting parallelism from different threads, but are not geared to exploit parallelism at different granularities (fine and medium grain). We present a feasibility study of a new execution model for exploiting both adjacent and distant parallelism in the dynamic instruction stream. Our model, called hierarchical multithreading, uses a two-level hierarchical arrangement of processing elements. The lower level of the hierarchy exploits instruction-level parallelism and fine-grain thread-level parallelism, whereas the upper level exploits more distant parallelism. Detailed simulation studies with a cycle accurate simulator are presented, showing the feasibility of hierarchical multithreading. Conclusions are drawn about the best ways to obtain the most front the hierarchical multithreading scheme.

[1]  Antonio González,et al.  Clustered speculative multithreaded processors , 1999, ICS '99.

[2]  Monica S. Lam,et al.  Limits of control flow on parallelism , 1992, ISCA '92.

[3]  Quinn Jacobson,et al.  Trace processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[4]  Joseph A. Fisher,et al.  Clustered Instruction-Level Parallel Processors , 1998 .

[5]  Hugh Garraway Parallel Computer Architecture: A Hardware/Software Approach , 1999, IEEE Concurrency.

[6]  Eduard Ayguadé,et al.  Increasing effective IPC by exploiting distant parallelism , 1999, ICS '99.

[7]  David W. Wall,et al.  Limits of instruction-level parallelism , 1991, ASPLOS IV.

[8]  Haitham Akkary,et al.  A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[9]  Xin Wang,et al.  Integrating Parallelizing Compilation Technology and Processor Architecture for Cost-Effective Concurrent multithreading , 1998, J. Inf. Sci. Eng..

[10]  Josep Torrellas,et al.  Executing Sequential Binaries on a Clustered Multithreaded Architecture with Speculation Support , 1998, HPCA 1998.

[11]  Gurindar S. Sohi,et al.  ARB: A Hardware Mechanism for Dynamic Reordering of Memory References , 1996, IEEE Trans. Computers.

[12]  Kai Wang,et al.  Highly accurate data value prediction using hybrid predictors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.