Predictable performance in SMT processors

Current instruction fetch policies in SMT processors are oriented towards optimization of overall throughput and/or fairness. However, they provide no control over how individual threads are executed, leading to performance unpredictability, since the IPC of a thread depends on the workload it is executed in and on the fetch policy used.From the point of view of the Operating System (OS), it is the job scheduler that determines how jobs are executed. However, when the OS runs on an SMT processor, the job scheduler cannot guarantee execution time constraints of any job due to this performance unpredictability.In this paper we propose a novel kind of collaboration between the OS and the SMT hardware that enables the OS to enforce that a high priority thread runs at a specific fraction of its full speed. We present an extensive evaluation using many different workloads, that shows that this mechanism gives the required performance in more than 97% of all cases considered, and even more than 99% for the less extreme cases. At the same time, our mechanism does not need to trade off predictability against overall throughput, as it maximizes the IPC of the remaining low priority threads, giving 94% on average (and 97.5% on average for the less extreme cases) of the throughput obtained using instruction fetch policies oriented toward throughput maximization, such as icount.

[1]  Francisco J. Cazorla,et al.  Improving Memory Latency Aware Fetch Policies for SMT Processors , 2003, ISHPC.

[2]  Sebastien Hily,et al.  Contention on 2nd Level Cache May Limit the Effectiveness of Simultaneous Multithreading , 1997 .

[3]  Mario Nemirovsky,et al.  Increasing superscalar performance through multistreaming , 1995, PACT.

[4]  Thomas F. Wenisch,et al.  SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, ISCA '03.

[5]  Brad Calder,et al.  Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[6]  Dean M. Tullsen,et al.  Handling long-latency loads in a simultaneous multithreading processor , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[7]  Donald Yeung,et al.  Transparent threads: resource sharing in SMT processors for high single-thread performance , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.

[8]  Manoj Franklin,et al.  Balancing thoughput and fairness in SMT processors , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..

[9]  Nader Bagherzadeh,et al.  Performance study of a multithreaded superscalar microprocessor , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[10]  Seong-Won Lee,et al.  Dynamic scheduling issues in SMT architectures , 2003, Proceedings International Parallel and Distributed Processing Symposium.

[11]  David H. Albonesi,et al.  Front-end policies for improved issue efficiency in SMT processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[12]  Kozo Kimura,et al.  An elementary processor architecture with simultaneous instruction issuing from multiple threads , 1992, ISCA '92.

[13]  Jack L. Lo,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[14]  P. N. Glaskowsky,et al.  IBM Previews Power5 , 2003 .

[15]  Srinivas Devadas,et al.  Dynamic Cache Partitioning via Columnization , 2000, DAC 2000.

[16]  Rohit Jain,et al.  Soft real-time scheduling on simultaneous multithreaded processors , 2002, 23rd IEEE Real-Time Systems Symposium, 2002. RTSS 2002..

[17]  Alexis Vartanian,et al.  Improving 3D geometry transformations on a simultaneous multithreaded SIMD processor , 2001, ICS '01.

[18]  Mateo Valero,et al.  Branch Classification for SMT Fetch Gating , 2002 .

[19]  Jean-Luc Gaudiot,et al.  Quantifying the SMT layout overhead-does SMT pull its weight? , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[20]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[21]  D. Marr,et al.  Hyper-Threading Technology Architecture and MIcroarchitecture , 2002 .

[22]  Jean-Luc Gaudiot,et al.  SMT Layout Overhead and Scalability , 2002, IEEE Trans. Parallel Distributed Syst..

[23]  Dean M. Tullsen,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.