Clustered multithreaded architectures - pursuing both IPC and cycle time

Summary form only given. Clustering is an architectural technique that allows the design of wide superscalar processors without sacrificing cycle time, but at the cost of longer communication latencies. Simultaneous multithreading architectures effectively tolerate instruction latency, but put even more pressure on timing-critical processor resources. We show that the synergistic combination of the two techniques minimizes the IPC impact of the clustered architecture, and even permits more aggressive clustering of the processor than is possible with a single-threaded processor. Additionally, we show that multithreading enables effective instruction steering policies unavailable to a single-threaded clustered architecture. We explore the impact of aggressively clustering four complex processor structures, (1) instruction window wakeup and functional unit bypass logic, (2) register renaming logic, (3) the fetch unit, and (4) the integer register file, on a simultaneous multithreading processor.

[1]  Josep Torrellas,et al.  A clustered approach to multithreaded processors , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[2]  John Paul Shen,et al.  Speculative precomputation: long-range prefetching of delinquent loads , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[3]  Kunle Olukotun,et al.  The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[4]  D. T. Marr,et al.  Hyper-threading technology architecture and microarchitecture : a hyperhtext history , 2002 .

[5]  Dean M. Tullsen,et al.  Fellowship - Simulation And Modeling Of A Simultaneous Multithreading Processor , 1996, Int. CMG Conference.

[6]  Quinn Jacobson,et al.  Trace processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[7]  Craig Zilles,et al.  Execution-based prediction using speculative slices , 2001, ISCA 2001.

[8]  Antonio González,et al.  A quantitative assessment of thread-level speculation techniques , 2000, Proceedings 14th International Parallel and Distributed Processing Symposium. IPDPS 2000.

[9]  Todd C. Mowry,et al.  The potential for using thread-level data speculation to facilitate automatic parallelization , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[10]  James E. Smith,et al.  Complexity-Effective Superscalar Processors , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[11]  Dean M. Tullsen,et al.  Handling long-latency loads in a simultaneous multithreading processor , 2001, Proceedings. 34th ACM/IEEE International Symposium on Microarchitecture. MICRO-34.

[12]  Kozo Kimura,et al.  An Elementary Processor Architecture with Simultaneous Instruction Issuing from Multiple Threads , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[13]  Monica S. Lam,et al.  Maximizing Multiprocessor Performance with the SUIF Compiler , 1996, Digit. Tech. J..

[14]  Dean M. Tullsen,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000, SIGP.

[15]  Mario Nemirovsky,et al.  Increasing superscalar performance through multistreaming , 1995, PACT.

[16]  A. Aggarwal,et al.  An empirical study of the scalability aspects of instruction distribution algorithms for clustered processors , 2001, 2001 IEEE International Symposium on Performance Analysis of Systems and Software. ISPASS..

[17]  Richard E. Kessler,et al.  The Alpha 21264 microprocessor , 1999, IEEE Micro.

[18]  Haitham Akkary,et al.  A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[19]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[20]  M TullsenDean,et al.  Symbiotic jobscheduling for a simultaneous mutlithreading processor , 2000 .

[21]  Dean M. Tullsen,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[22]  Ramon Canal,et al.  Dynamic cluster assignment mechanisms , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[23]  Norman P. Jouppi,et al.  The multicluster architecture: reducing cycle time through partitioning , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[24]  R. Balasubramonian,et al.  Dynamically managing the communication-parallelism trade-off in future clustered processors , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..