Improving IPC in simultaneous multi-threading (SMT) processors by capping IQ utilization according to dispatched memory instructions

Simultaneous multithreading (SMT) provides a method to improve resource uti-lization and performance of superscalar CPUs by sharing key data-path components among multiple independent threads. As threads have unstable behavior, Effective use of critical resources among threads is a challenge to SMT. One of most critical shared resources in the pipeline is Issue Queue (IQ) so putting a limit on its occupation by each thread leads improving in the overall throughput; however, to accommodate the transient behavior of each thread, setting a limit (cap) should be done properly in real time in order to preclude under-utilization (thus, under-achieving) due to over-capping, or starvation for some threads due to under- capping. In this paper, a simple dynamic algorithm is proposed to adjust the cap value for each thread in real time according to the number of memory instructions of each thread. The simulation results show a considerable improvement in IPC over the regular no-capping technique and even a performance superior to the fixed capping approach by using the proposed method.

[1]  Todd M. Austin,et al.  The SimpleScalar tool set, version 2.0 , 1997, CARN.

[2]  Hans Vandierendonck,et al.  Managing SMT resource usage through speculative instruction window weighting , 2011, TACO.

[3]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[4]  Kanad Ghose,et al.  Instruction packing: Toward fast and energy-efficient instruction scheduling , 2006, TACO.

[5]  Wei-Ming Lin,et al.  Buffer Sharing Control in SMT Processors , 2013 .

[6]  Eugene John,et al.  Effective Dispatching for Simultaneous Multi-Threading (SMT) Processors by Capping Per-Thread Resource Utilization , 2011 .

[7]  Israel Koren,et al.  An Adaptive Resource Partitioning Algorithm for SMT processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[8]  Yilin Zhang,et al.  A Real-Time Per-Thread IQ-Capping Technique for Simultaneous Multi-threading (SMT) Processors , 2014, 2014 11th International Conference on Information Technology: New Generations.

[9]  Hui Wang,et al.  Optimizing Instruction Scheduling through Combined In-Order and O-O-O Execution in SMT Processors , 2009, IEEE Transactions on Parallel and Distributed Systems.

[10]  Jack L. Lo,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[11]  Kanad Ghose,et al.  SPARTAN: Speculative avoidance of register allocations to transient values for performance and energy efficiency , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[12]  Joseph J. Sharkey,et al.  Efficient instruction schedulers for SMT processors , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[13]  Wei-Ming Lin,et al.  On Maximizing Resource Utilization for Simultaneous Multi-Threading ( SMT ) Processors by Instruction Recalling ∗ , 2012 .

[14]  Stijn Eyerman,et al.  Probabilistic job symbiosis modeling for SMT processor scheduling , 2010, ASPLOS XV.

[15]  Yilin Zhang,et al.  Capping Speculative Traces to Improve Performance in Simultaneous Multi-threading CPUs , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[16]  Stijn Eyerman,et al.  Memory-level parallelism aware fetch policies for simultaneous multithreading processors , 2009, TACO.

[17]  Brad Calder,et al.  Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[18]  Joseph J. Sharkey,et al.  Adaptive reorder buffers for SMT processors , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[19]  Joseph J. Sharkey,et al.  Exploiting Operand Availability for Efficient Simultaneous Multithreading , 2007, IEEE Transactions on Computers.

[20]  Joseph J. Sharkey,et al.  Reducing register pressure in SMT processors through L2-miss-driven early register release , 2008, TACO.

[21]  José Ignacio Hidalgo,et al.  Improving SMT performance: an application of genetic algorithms to configure resizable caches , 2009, GECCO '09.

[22]  John L. Henning SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.

[23]  Yilin Zhang,et al.  Autonomous control of issue queue utilization for simultaneous multi-threading processors , 2014, SpringSim.