Capping Speculative Traces to Improve Performance in Simultaneous Multi-threading CPUs

Simultaneous Multi-Threading (SMT) improves the overall performance of superscalar CPUs by allowing concurrent execution of multiple independent threads with sharing of key data path components in order to better utilize the resources. Speculative executions help modern processors to exploit more Instruction-Level Parallelism. However, the performance penalty from a miss speculation is much more prominent in an SMT environment than a traditional multi-threading system due to the resulted waste of shared resources at clock-cycle level, versus thread level. In this paper, we show that instructions fetched due to incorrect prediction can be more than 30% of all instructions, which results in a huge waste of resources that could have been better used by other non-speculative threads. To minimize this waste of resources, a technique is proposed in this paper to control the amount of speculative instructions dispatched into Issue Queue (IQ), the most critically shared resource in the SMT pipeline. Simulation result shows the proposed technique can reduce the waste of resource due to miss-speculated traces by 38% and improve overall throughput by up to 17% in IPC.

[1]  Kozo Kimura,et al.  An Elementary Processor Architecture with Simultaneous Instruction Issuing from Multiple Threads , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[2]  Jean-Luc Gaudiot,et al.  Speculation control for simultaneous multithreading , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[3]  Eugene John,et al.  Effective Dispatching for Simultaneous Multi-Threading (SMT) Processors by Capping Per-Thread Resource Utilization , 2011 .

[4]  Jack L. Lo,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[5]  D. Yeung Learning-Based SMT Processor Resource Distribution via Hill-Climbing , 2006, ISCA 2006.

[6]  Dean M. Tullsen,et al.  Handling long-latency loads in a simultaneous multithreading processor , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[7]  Joseph T. Rahmeh,et al.  Improving the accuracy of dynamic branch prediction using branch correlation , 1992, ASPLOS V.

[8]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[9]  Francisco J. Cazorla,et al.  Dynamically Controlled Resource Allocation in SMT Processors , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[10]  James E. Smith,et al.  A study of branch prediction strategies , 1981, ISCA '98.

[11]  Dmitry V. Ponomarev,et al.  Aggressive Scheduling and Speculation in Multithreaded Architectures: Is it Worth its Salt? , 2008, 2008 20th International Symposium on Computer Architecture and High Performance Computing.

[12]  Milos Prvulovic,et al.  PEEP: Exploiting predictability of memory dependences in SMT processors , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.

[13]  Wei-Ming Lin,et al.  On Maximizing Resource Utilization for Simultaneous Multi-Threading ( SMT ) Processors by Instruction Recalling ∗ , 2012 .

[14]  Manoj Franklin,et al.  Boosting SMT performance by speculation control , 2001, Proceedings 15th International Parallel and Distributed Processing Symposium. IPDPS 2001.

[15]  Dean M. Tullsen,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[16]  Jean-Luc Gaudiot,et al.  SPEAR: a hybrid model for speculative pre-execution , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[17]  S. McFarling Combining Branch Predictors , 1993 .