论文信息 - Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor

Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor

Simultaneous multithreading is a technique that permits multiple independent threads to issue multiple instructions each cycle. In previous work we demonstrated the performance potential of simultaneous multithreading, based on a somewhat idealized model. In this paper we show that the throughput gains from simultaneous multithreading can be achieved without extensive changes to a conventional wide-issue superscalar, either in hardware structures or sizes. We present an architecture for simultaneous multithreading that achieves three goals: (1) it minimizes the architectural impact on the conventional superscalar design, (2) it has minimal performance impact on a single thread executing alone, and (3) it achieves significant throughput gains when running multiple threads. Our simultaneous multithreading architecture achieves a throughput of 5.4 instructions per cycle, a 2.5-fold improvement over an unmodified superscalar with similar hardware resources. This speedup is enhanced by an advantage of multithreading previously unexploited in other architectures: the ability to favor for fetch and issue those threads most efficiently using the processor each cycle, thereby providing the "best" instructions to the processor.

[1] Burton J. Smith. Architecture And Applications Of The HEP Multiprocessor Computer System , 1982, Optics & Photonics.

[2] Michael D. Smith,et al. Limits on multiple instruction issue , 1989, ASPLOS III.

[3] Anant Agarwal,et al. APRIL: a processor architecture for multiprocessing , 1990, ISCA '90.

[4] Allan Porterfield,et al. The Tera computer system , 1990, ICS '90.

[5] Chuan-lin Wu,et al. A Benchmark Evaluation of a Multi-threaded RISC Processor Architecture , 1991, ICPP.

[6] Hwa C. Torng,et al. The Concurrent Execution of Multiple Instruction Streams on Superscalar Processors , 1991, ICPP.

[7] Gurindar S. Sohi,et al. High-bandwidth data memory systems for superscalar processors , 1991, ASPLOS IV.

[8] William J. Dally,et al. Processor coupling: integrating compile time and runtime scheduling for parallelism , 1992, ISCA '92.

[9] An elementary processor architecture with simultaneous instruction issuing from multiple threads , 1992 .

[10] Yale N. Patt,et al. Alternative Implementations of Two-Level Adaptive Branch Prediction , 1992, [1992] Proceedings the 19th Annual International Symposium on Computer Architecture.

[11] Kozo Kimura,et al. An elementary processor architecture with simultaneous instruction issuing from multiple threads , 1992, ISCA '92.

[12] K.M. Dixit. New CPU benchmark suites from SPEC , 1992, Digest of Papers COMPCON Spring 1992.

[13] Constantine D. Polychronopoulos,et al. Microarchitecture support for dynamic scheduling of acyclic task graphs , 1992, MICRO 1992.

[14] Constantine D. Polychronopoulos,et al. Microarchitecture support for dynamic scheduling of acyclic task graphs , 1992, MICRO.

[15] S. McFarling. Combining Branch Predictors , 1993 .

[16] Mauricio J. Serrano,et al. Performance estimation of multistreamed, superscalar processors , 1994, 1994 Proceedings of the Twenty-Seventh Hawaii International Conference on System Sciences.

[17] D. Grunwald,et al. Fast and accurate instruction fetch and branch prediction , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[18] Anoop Gupta,et al. Interleaving: a multithreading technique targeting multiprocessors and workstations , 1994, ASPLOS VI.

[19] Dirk Grunwald,et al. Fast and accurate instruction fetch and branch prediction , 1994, ISCA '94.

[20] William J. Dally,et al. The M-machine multicomputer , 1997, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[21] Mario Nemirovsky,et al. Increasing superscalar performance through multistreaming , 1995, PACT.

[22] Gurindar S. Sohi,et al. Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[23] R. Govindarajan,et al. Design and performance evaluation of a multithreaded architecture , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[24] Burzin A. Patel,et al. Optimization of instruction fetch mechanisms for high issue rates , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[25] Yamin Li,et al. The effects of STEF in finely parallel multithreaded processors , 1995, Proceedings of 1995 1st IEEE Symposium on High Performance Computer Architecture.

[26] Dean M. Tullsen,et al. Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[27] Nader Bagherzadeh,et al. Performance study of a multithreaded superscalar microprocessor , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[28] Andrew W. AppelJanuary. Measuring Limits of Fine-grained Parallelism , 1997 .

[29] Alternative implementations of two-level adaptive branch prediction , 1993, ISCA '98.