A minimal dual-core speculative multi-threading architecture

Speculative multi-threading (SpMT) can improve single-threaded application performance using the multiple thread contexts available in current processors. We propose a minimal SpMT model that uses only two thread contexts. The model achieves significant speedups for single-threaded applications using a low-overhead scheme for detecting and selectively recovering from data dependence violations, and a novel wrong path predictor to reduce the number of speculative threads executing along the wrong path. We also study the interactions between three previously proposed SpMT thread spawning policies that can be implemented dynamically in hardware - Fork on Call, Loop Continuation and Run Ahead policies - and show it is beneficial to implement all three policies together in a processor. While the individual thread spawning policies show performance benefits of 14%, 5% and 4% respectively on our SpMT model over a base processor that does not exploit SpMT, combining all three policies shows an average performance gain of 20%. Finally, we identify the sources of SpMT benefits - on average, 58% of the performance benefits due to SpMT comes from cache prefetching, 33% from instruction reuse, and 9% from branch precomputation and show all three sources of SpMT benefits must be utilized to realize the full potential of SpMT.

[1]  Weihaw Chuang,et al.  The Intel IA-64 Compiler Code Generator , 2000, IEEE Micro.

[2]  Craig Zilles,et al.  Execution-based prediction using speculative slices , 2001, ISCA 2001.

[3]  D. Scott Wills,et al.  On dynamic speculative thread partitioning and the MEM-slicing algorithm , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[4]  Onur Mutlu,et al.  Runahead execution: an alternative to very large instruction windows for out-of-order processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[5]  David J. Sager,et al.  The microarchitecture of the Pentium 4 processor , 2001 .

[6]  Rakesh Krishnaiyer,et al.  An Advanced Optimizer for the IA-64 Architecture , 2000, IEEE Micro.

[7]  Antonio González,et al.  Exploiting Speculative Thread-Level Parallelism on a SMT Processor , 1999, HPCN Europe.

[8]  Eric Rotenberg,et al.  A study of slipstream processors , 2000, MICRO 33.

[9]  Antonio González,et al.  Thread-spawning schemes for speculative multithreading , 2002, Proceedings Eighth International Symposium on High Performance Computer Architecture.

[10]  Haitham Akkary,et al.  A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[11]  Samuel D. Naffziger,et al.  The implementation of the next-generation 64b itanium microprocessor , 2002 .

[12]  Manoj Franklin,et al.  Branch prediction in multi-threaded processors , 2000, Proceedings 2000 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00622).

[13]  Antonia Zhai,et al.  Compiler optimization of scalar value communication between speculative threads , 2002, ASPLOS X.

[14]  Trevor Mudge,et al.  Improving data cache performance by pre-executing instructions under a cache miss , 1997 .

[15]  Gurindar S. Sohi,et al.  Compiling for the multiscalar architecture , 1998 .

[16]  Gurindar S. Sohi,et al.  The Expandable Split Window Paradigm for Exploiting Fine-grain Parallelism , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.