The shared-thread multiprocessor

This paper describes initial results for an architecture called the Shared-Thread Multiprocessor (STMP). The STMP combines features of a multithreaded processor and a chip multiprocessor; specifically, it enables distinct cores on a chip multiprocessor to share thread state. This shared thread state allows the system to schedule threads from a shared pool onto individual cores, allowing for rapid movement of threads between cores. This paper demonstrates and evaluates three benefits of this architecture: (1) By providing more thread state storage than available in the cores themselves, the architecture enjoys the ILP benefits of many threads, but carries the in-core complexity of supporting just a few. (2) Threads can move between cores fast enough to hide long-latency events such as memory accesses. This enables very-short-term load balancing in response to such events. (3) The system can redistribute threads to maximize symbiotic behavior and balance load much more often than traditional operating system thread scheduling and context switching.

[1]  Paraskevas Evripidou,et al.  Chip multiprocessor based on data-driven multithreading model , 2007, Int. J. High Perform. Syst. Archit..

[2]  Balaram Sinharoy,et al.  Design and implementation of the POWER5 microprocessor , 2004, Proceedings. 41st Design Automation Conference, 2004..

[3]  Arvind,et al.  Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1987, IEEE Trans. Computers.

[4]  Gaurav Mittal,et al.  Design of the Power6 Microprocessor , 2007, 2007 IEEE International Solid-State Circuits Conference. Digest of Technical Papers.

[5]  Dean M. Tullsen,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[6]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[7]  Susan J. Eggers,et al.  Thread-Sensitive Scheduling for SMT Processors , 2000 .

[8]  Arvind Data Flow Languages and Architecture , 1981, ISCA.

[9]  Dean M. Tullsen,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[10]  Kunle Olukotun,et al.  Niagara: a 32-way multithreaded Sparc processor , 2005, IEEE Micro.

[11]  Santosh G. Abraham,et al.  Chip multithreading: opportunities and challenges , 2005, 11th International Symposium on High-Performance Computer Architecture.

[12]  Josep Torrellas,et al.  Evaluating the Performance of Cache-Affinity Scheduling in Shared-Memory Multiprocessors , 1995, J. Parallel Distributed Comput..

[13]  Arvind Data flow languages and architectures , 1981, ISCA '81.

[14]  Timothy Johnson,et al.  An 8-core, 64-thread, 64-bit power efficient sparc soc (niagara2) , 2007, ISPD '07.

[15]  Dean M. Tullsen,et al.  Fellowship - Simulation And Modeling Of A Simultaneous Multithreading Processor , 1996, Int. CMG Conference.

[16]  Randy H. Katz,et al.  Implementing a cache consistency protocol , 1985, ISCA '85.

[17]  Brad Calder,et al.  Balanced Multithreading: Increasing Throughput via a Low Cost Multithreading Hierarchy , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).

[18]  Arvind,et al.  Executing a Program on the MIT Tagged-Token Dataflow Architecture , 1990, IEEE Trans. Computers.

[19]  David A. Koufaty,et al.  Hyperthreading Technology in the Netburst Microarchitecture , 2003, IEEE Micro.

[20]  Yiannakis Sazeides,et al.  Performance implications of single thread migration on a chip multi-core , 2005, CARN.

[21]  Dean M. Tullsen,et al.  Handling long-latency loads in a simultaneous multithreading processor , 2001, MICRO.