A complexity-effective simultaneous multithreading architecture

Different applications may exhibit radically different behaviors and thus have very different requirements in terms of hardware support. In simultaneous multithreading (SMT) architectures, the hardware is shared among multiple running applications in order to better profit from it. However, current architectures are designed for the common case, and try to satisfy a number of different application classes with a single design. That is, current designs are usually overdesigned for most cases, obtaining high performance, but wasting a lot of resources to do so. In this paper we present an alternative SMT architecture, the heterogeneously distributed SMT (hdSMT). Our architecture is based in a novel combination of SMT and clustering techniques in a heterogeneity-aware fashion. The hardware is designed to match the heterogeneous application behavior with the statically and heterogeneously partitioned resources. Such a design is aimed for minimizing the amount of resources wasted to achieve a given performance rate. On top of our statically partitioned architecture, we propose a heuristic policy to map threads to clusters so that each cluster matches the characteristics of the running threads and overall hardware usage is optimized. We compare our hdSMT architecture with a monolithic SMT processor, where all threads compete for the same resources, and with a homogeneous clustered SMT, where resources are statically and equally partitioned across clusters. Our results show that hdSMT architectures obtain an average improvement of 13% and 14% in optimizing performance per area over monolithic SMT and homogeneously clustered SMT respectively.

[1]  David H. Albonesi,et al.  Front-end policies for improved issue efficiency in SMT processors , 2003, The Ninth International Symposium on High-Performance Computer Architecture, 2003. HPCA-9 2003. Proceedings..

[2]  Norman P. Jouppi,et al.  Single-ISA Heterogeneous Multi-Core Architectures: The Potential for Processor Power Reduction , 2003, MICRO.

[3]  José González,et al.  Back-end assignment schemes for clustered multithreaded processors , 2004, ICS '04.

[4]  Kunle Olukotun,et al.  The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[5]  Jean-Luc Gaudiot,et al.  Quantifying the SMT layout overhead-does SMT pull its weight? , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[6]  Steven K. Reinhardt,et al.  The impact of resource partitioning on SMT processors , 2003, 2003 12th International Conference on Parallel Architectures and Compilation Techniques.

[7]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[8]  Seong-Won Lee,et al.  Clustered Microarchitecture Simultaneous Multithreading , 2003, Euro-Par.

[9]  Theo Ungerer,et al.  Transistor count and chip-space estimation of simplescalar-based microprocessor model , 2001 .

[10]  Norman P. Jouppi,et al.  Single-ISA heterogeneous multi-core architectures for multithreaded workload performance , 2004, Proceedings. 31st Annual International Symposium on Computer Architecture, 2004..

[11]  Brad Calder,et al.  Basic block distribution analysis to find periodic behavior and simulation points in applications , 2001, Proceedings 2001 International Conference on Parallel Architectures and Compilation Techniques.

[12]  Mario Nemirovsky,et al.  Increasing superscalar performance through multistreaming , 1995, PACT.

[13]  Dean M. Tullsen,et al.  Clustered multithreaded architectures - pursuing both IPC and cycle time , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[14]  Jean-Luc Gaudiot,et al.  SMT Layout Overhead and Scalability , 2002, IEEE Trans. Parallel Distributed Syst..

[15]  Francisco J. Cazorla,et al.  Dcache Warn: an I-fetch policy to increase SMT efficiency , 2004, 18th International Parallel and Distributed Processing Symposium, 2004. Proceedings..

[16]  Dean M. Tullsen,et al.  Handling long-latency loads in a simultaneous multithreading processor , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[17]  Kunle Olukotun,et al.  A Single-Chip Multiprocessor , 1997, Computer.

[18]  T. Ungerer,et al.  - 1-On Performance , Transistor Count and Chip Space Assessment of Multimedia-enhanced Simultaneous Multithreaded Processors , 2000 .

[19]  Jack L. Lo,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).