论文信息 - Improving IPC in simultaneous multi-threading (SMT) processors by capping IQ utilization according to dispatched memory instructions

Improving IPC in simultaneous multi-threading (SMT) processors by capping IQ utilization according to dispatched memory instructions

Simultaneous multithreading (SMT) provides a method to improve resource uti-lization and performance of superscalar CPUs by sharing key data-path components among multiple independent threads. As threads have unstable behavior, Effective use of critical resources among threads is a challenge to SMT. One of most critical shared resources in the pipeline is Issue Queue (IQ) so putting a limit on its occupation by each thread leads improving in the overall throughput; however, to accommodate the transient behavior of each thread, setting a limit (cap) should be done properly in real time in order to preclude under-utilization (thus, under-achieving) due to over-capping, or starvation for some threads due to under- capping. In this paper, a simple dynamic algorithm is proposed to adjust the cap value for each thread in real time according to the number of memory instructions of each thread. The simulation results show a considerable improvement in IPC over the regular no-capping technique and even a performance superior to the fixed capping approach by using the proposed method.

[1] Todd M. Austin,et al. The SimpleScalar tool set, version 2.0 , 1997, CARN.

[2] Hans Vandierendonck,et al. Managing SMT resource usage through speculative instruction window weighting , 2011, TACO.

[3] Dean M. Tullsen,et al. Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[4] Kanad Ghose,et al. Instruction packing: Toward fast and energy-efficient instruction scheduling , 2006, TACO.

[5] Wei-Ming Lin,et al. Buffer Sharing Control in SMT Processors , 2013 .

[6] Eugene John,et al. Effective Dispatching for Simultaneous Multi-Threading (SMT) Processors by Capping Per-Thread Resource Utilization , 2011 .

[7] Israel Koren,et al. An Adaptive Resource Partitioning Algorithm for SMT processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[8] Yilin Zhang,et al. A Real-Time Per-Thread IQ-Capping Technique for Simultaneous Multi-threading (SMT) Processors , 2014, 2014 11th International Conference on Information Technology: New Generations.

[9] Hui Wang,et al. Optimizing Instruction Scheduling through Combined In-Order and O-O-O Execution in SMT Processors , 2009, IEEE Transactions on Parallel and Distributed Systems.

[10] Jack L. Lo,et al. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[11] Kanad Ghose,et al. SPARTAN: Speculative avoidance of register allocations to transient values for performance and energy efficiency , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[12] Joseph J. Sharkey,et al. Efficient instruction schedulers for SMT processors , 2006, The Twelfth International Symposium on High-Performance Computer Architecture, 2006..

[13] Wei-Ming Lin,et al. On Maximizing Resource Utilization for Simultaneous Multi-Threading ( SMT ) Processors by Instruction Recalling ∗ , 2012 .

[14] Stijn Eyerman,et al. Probabilistic job symbiosis modeling for SMT processor scheduling , 2010, ASPLOS XV.

[15] Yilin Zhang,et al. Capping Speculative Traces to Improve Performance in Simultaneous Multi-threading CPUs , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.

[16] Stijn Eyerman,et al. Memory-level parallelism aware fetch policies for simultaneous multithreading processors , 2009, TACO.

[17] Brad Calder,et al. Automatically characterizing large scale program behavior , 2002, ASPLOS X.

[18] Joseph J. Sharkey,et al. Adaptive reorder buffers for SMT processors , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).

[19] Joseph J. Sharkey,et al. Exploiting Operand Availability for Efficient Simultaneous Multithreading , 2007, IEEE Transactions on Computers.

[20] Joseph J. Sharkey,et al. Reducing register pressure in SMT processors through L2-miss-driven early register release , 2008, TACO.

[21] José Ignacio Hidalgo,et al. Improving SMT performance: an application of genetic algorithms to configure resizable caches , 2009, GECCO '09.

[22] John L. Henning. SPEC CPU2000: Measuring CPU Performance in the New Millennium , 2000, Computer.

[23] Yilin Zhang,et al. Autonomous control of issue queue utilization for simultaneous multi-threading processors , 2014, SpringSim.