SMT Layout Overhead and Scalability

Simultaneous Multi-Threading (SMT) is a hardware technique that increases processor throughput by issuing instructions simultaneously from multiple threads. However, while SMT can be added to an existing microarchitecture with relatively low overhead, this additional chip area could be used for other resources such as more functional units, larger caches, or better branch predictors. How large is the SMT overhead and at what point does SMT no longer pay off for maximum throughput compared to adding other architecture features? This paper evaluates the silicon overhead of SMT by performing a transistor/interconnect-level analysis of the layout. We discuss microarchitecture issues that impact SMT implementations and show how the Instruction Set Architecture (ISA) and microarchitecture can have a large effect on the SMT overhead and performance. Results show that SMT yields large performance gains with small to moderate area overhead.

[1]  Jack L. Lo,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[2]  Dean M. Tullsen,et al.  Software-Directed Register Deallocation for Simultaneous Multithreaded Processors , 1999, IEEE Trans. Parallel Distributed Syst..

[3]  André Seznec,et al.  Branch prediction and simultaneous multithreading , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[4]  Norman P. Jouppi,et al.  Register file design considerations in dynamically scheduled processors , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[5]  Dean M. Tullsen,et al.  Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading , 1997, TOCS.

[6]  Soha Hassoun,et al.  A 200-MHz 64-bit Dual-Issue CMOS Microprocessor , 1992, Digit. Tech. J..

[7]  Kenneth C. Yeager The Mips R10000 superscalar microprocessor , 1996, IEEE Micro.

[8]  Kunle Olukotun,et al.  The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[9]  Dean M. Tullsen,et al.  Simultaneous multithreading: a platform for next-generation processors , 1997, IEEE Micro.

[10]  Mario Nemirovsky,et al.  Increasing superscalar performance through multistreaming , 1995, PACT.

[11]  Nader Bagherzadeh,et al.  A scalable register file architecture for dynamically scheduled processors , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[12]  Jean-Luc Gaudiot,et al.  Quantifying the SMT layout overhead-does SMT pull its weight? , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[13]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[14]  Jean-Luc Gaudiot,et al.  Area and system clock effects on SMT/CMP processors , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[15]  Mauricio J. Serrano,et al.  Performance estimation of multistreamed, superscalar processors , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[16]  Yale N. Patt,et al.  Alternative fetch and issue policies for the trace cache fetch mechanism , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[17]  Ashok Kumar,et al.  The HP PA-8000 RISC CPU , 1997, IEEE Micro.

[18]  Dean M. Tullsen,et al.  Tuning Compiler Optimizations for Simultaneous Multithreading , 2004, International Journal of Parallel Programming.

[19]  J. Lotz,et al.  A quad-issue out-of-order RISC CPU , 1996, 1996 IEEE International Solid-State Circuits Conference. Digest of TEchnical Papers, ISSCC.

[20]  Nader Bagherzadeh,et al.  A fine-grain multithreading superscalar architecture , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.