Multithreaded Processors

The instruction-level parallelism found in a conventional instruction stream is limited. Studies have shown the limits of processor utilization even for today’s superscalar microprocessors. One solution is the additional utilization of more coarse-grained parallelism. The main approaches are the (single) chip multiprocessor and the multithreaded processor which optimize the throughput of multiprogramming workloads rather than single-thread performance. The chip multiprocessor integrates two or more complete processors on a single chip. Every unit of a processor is duplicated and used independently of its copies on the chip. In contrast, the multithreaded processor is able to pursue two or more threads of control in parallel within the processor pipeline. Unused instruction slots, which arise from pipelined execution of single-threaded programs by a contemporary microprocessor, are filled by instructions of other threads within a multithreaded processor. The execution units are multiplexed between the threads in the register sets. Underutilization of a superscalar processor due to missing instruction-level parallelism can be overcome by simultaneous multithreading, where a processor can issue multiple instructions from multiple threads each cycle. Simultaneous multithreaded processors combine the multithreading technique with a wideissue superscalar processor such that the full issue bandwidth is utilized by potentially issuing instructions from different threads simultaneously. This survey paper explains and classifies the various multithreading techniques in research and in commercial microprocessors and compares multithreaded processors with chip multiprocessors.

[1]  Marc Tremblay,et al.  The MAJC Architecture: A Synthesis of Parallelism and Scalability , 2000, IEEE Micro.

[2]  Robert A. Iannucci,et al.  Editors: Multithreaded computer architecture : A summary of the state of the art , 1994 .

[3]  C. R. Jesshope,et al.  Dynamic scheduling in RISC architectures , 1996 .

[4]  Mikko H. Lipasti,et al.  The Performance Potential of Value and Dependence Prediction , 1997, Euro-Par.

[5]  Simon Kahan,et al.  Scheduling on the Tera MTA , 1995, JSSPP.

[6]  Donald Yeung,et al.  The MIT Alewife machine: architecture and performance , 1995, ISCA '98.

[7]  Christopher Hughes,et al.  Speculative precomputation: long-range prefetching of delinquent loads , 2001, ISCA 2001.

[8]  Augustus K. Uht,et al.  Disjoint eager execution: an optimal form of speculative execution , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[9]  Craig Hansen MicroUnity's MediaProcessor architecture , 1996, IEEE Micro.

[10]  Uwe Brinkschulte,et al.  A multithreaded Java microcontroller for thread-oriented real-time event-handling , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[11]  Jeremiah Golston Single-chip H.324 videoconferencing , 1996, IEEE Micro.

[12]  David W. Wall,et al.  Limits of instruction-level parallelism , 1991, ASPLOS IV.

[13]  Chris R. Jesshope,et al.  Micro-threading: a new approach to future RISC , 2000, Proceedings 5th Australasian Computer Architecture Conference. ACAC 2000 (Cat. No.PR00512).

[14]  Jenn-Yuan Tsai,et al.  The superthreaded architecture: thread pipelining with run-time data dependence checking and control speculation , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[15]  Brad Calder,et al.  Threaded multiple path execution , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[16]  Theo Ungerer,et al.  Towards extremely fast context switching in a block-multithreaded processor , 1996, Proceedings of EUROMICRO 96. 22nd Euromicro Conference. Beyond 2000: Hardware and Software Design Strategies.

[17]  Mikhail Dorojevets COOL MULTITHREADING IN HTMT SPELL-1 PROCESSORS , 2000 .

[18]  Brad Calder,et al.  Instruction recycling on a multiple-path processor , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[19]  Antonio González,et al.  Control speculation in multithreaded processors through dynamic loop detection , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[20]  Susan J. Eggers,et al.  An analysis of database workload performance on simultaneous multithreaded processors , 1998, ISCA.

[21]  Dean M. Tullsen,et al.  Converting thread-level parallelism to instruction-level parallelism via simultaneous multithreading , 1997, TOCS.

[22]  Burton J. Smith,et al.  The architecture of HEP , 1985 .

[23]  William J. Dally,et al.  The message-driven processor: a multicomputer processing node with efficient mechanisms , 1992, IEEE Micro.

[24]  Manoj Franklin,et al.  The multiscalar architecture , 1993 .

[25]  Anoop Gupta,et al.  Parallel computer architecture - a hardware / software approach , 1998 .

[26]  Andreas Moshovos,et al.  Memory dependence prediction , 1998 .

[27]  Theo Ungerer,et al.  Evaluating A Multithreaded Superscalar Microprocessor Versus A Multiprocessor Chip , 1996 .

[28]  Mikko H. Lipasti,et al.  Value locality and load value prediction , 1996, ASPLOS VII.

[29]  Joel S. Emer,et al.  Simultaneous multithreading: multiplying alpha performance , 1999 .

[30]  Kevin O'Brien,et al.  Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading , 1995, PACT.

[31]  Chris R. Jesshope Implementing an efficient vector instruction set in a chip multi-processor using micro-threaded pipelines , 2001, Proceedings 6th Australasian Computer Systems Architecture Conference. ACSAC 2001.

[32]  Haitham Akkary,et al.  A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[33]  Burton J. Smith,et al.  A processor architecture for Horizon , 1988, Proceedings. SUPERCOMPUTING '88.

[34]  Jean-Luc Gaudiot,et al.  Quantifying the SMT layout overhead-does SMT pull its weight? , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).

[35]  Alexander Metzner,et al.  The EVENTS Approach to Rapid Prototyping for Embedded Control Systems , 1997 .

[36]  S. Vajapeyam,et al.  Improving Superscalar Instruction Dispatch And Issue By Exploiting Dynamic Code Sequences , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.

[37]  Allan Porterfield,et al.  The Tera computer system , 1990, ICS '90.

[38]  Guang R. Gao,et al.  Multithreaded Architectures: Principles, Projects, and Issues , 1994, Multithreaded Computer Architecture.

[39]  Donald Yeung,et al.  Sparcle: an evolutionary processor design for large-scale multiprocessors , 1993, IEEE Micro.

[40]  Arno Formella,et al.  HPP: A High Performance PRAM , 1996, Euro-Par, Vol. II.

[41]  James E. Smith,et al.  Trace Processors: Moving to Fourth-Generation Microarchitectures , 1997, Computer.

[42]  Uwe Brinkschulte,et al.  The Komodo project: thread-based event handling supported by a multithreaded Java microcontroller , 1999, Proceedings 25th EUROMICRO Conference. Informatics: Theory and Practice for the New Millennium.

[43]  Monica S. Lam,et al.  Limits of control flow on parallelism , 1992, ISCA '92.

[44]  Josep Torrellas,et al.  A clustered approach to multithreaded processors , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[45]  Michael Shebanow,et al.  Single instruction stream parallelism is greater than two , 1991, ISCA '91.

[46]  Theo Ungerer,et al.  Processor architecture - from dataflow to superscalar and beyond , 1999 .

[47]  Krishna M. Kavi,et al.  Execution and Cache Performance of the Scheduled Dataflow Architecture , 2000, J. Univers. Comput. Sci..

[48]  Kozo Kimura,et al.  An elementary processor architecture with simultaneous instruction issuing from multiple threads , 1992, ISCA '92.

[49]  Luiz André Barroso,et al.  Memory system characterization of commercial workloads , 1998, ISCA.

[50]  Theo Ungerer,et al.  Asynchrony in Parallel Computing: From Dataflow to Multithreading , 2001, Scalable Comput. Pract. Exp..

[51]  Chi-Keung Luk,et al.  Tolerating memory latency through software-controlled pre-execution in simultaneous multithreading processors , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[52]  Kunle Olukotun,et al.  The case for a single-chip multiprocessor , 1996, ASPLOS VII.

[53]  Theo Ungerer,et al.  MPEG-2 video decompression on simultaneous multithreaded multimedia processors , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[54]  Uwe Brinkschulte,et al.  Real-time scheduling on multithreaded processors , 2000, Proceedings Seventh International Conference on Real-Time Computing Systems and Applications.

[55]  Nader Bagherzadeh,et al.  Performance study of a multithreaded superscalar microprocessor , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.

[56]  Mateo Valero,et al.  Exploiting instruction- and data-level parallelism , 1997, IEEE Micro.

[57]  Werner Damm,et al.  MSparc: A Multithreaded Sparc , 1996, Euro-Par, Vol. II.

[58]  C. Q. Lee,et al.  The Computer Journal , 1958, Nature.

[59]  Theo Ungerer,et al.  A multithreaded processor designed for distributed shared memory systems , 1997, Proceedings. Advances in Parallel and Distributed Computing.

[60]  Veljko Milutinovic,et al.  Surviving the Design of Microprocessor and Multimicroprocessor Systems: Lessons Learned , 2000 .

[61]  Uwe Brinkschulte,et al.  A microkernel middleware architecture for distributed embedded real-time systems , 2001, Proceedings 20th IEEE Symposium on Reliable Distributed Systems.

[62]  Subramania Sudharsanan,et al.  MAJC-5200: A High Performance Microprocessor for Multimedia Computing , 2000, IPDPS Workshops.

[63]  Steven R. Kunkel,et al.  A multithreaded PowerPC processor for commercial servers , 2000, IBM J. Res. Dev..

[64]  Burton J. Smith Architecture And Applications Of The HEP Multiprocessor Computer System , 1982, Optics & Photonics.

[65]  William J. Dally,et al.  Concurrent Event Handling through Multithreading , 1999, IEEE Trans. Computers.

[66]  Robert F Boothe Evaluation of Multithreading and Caching in Large Shared Memory , 1993 .

[67]  Kunle Olukotun,et al.  Considerations in the Design of Hydra: A Multiprocessor-on-a-Chip Microarchitecture , 1998 .

[68]  Theo Ungerer,et al.  Transistor count and chip-space estimation of simplescalar-based microprocessor model , 2001 .

[69]  Yale N. Patt,et al.  One Billion Transistors, One Uniprocessor, One Chip , 1997, Computer.

[70]  J. Petrovick,et al.  The circuit and physical design of the POWER4 microprocessor , 2002, IBM J. Res. Dev..

[71]  Keith Diefendorff,et al.  Power4 focuses on memory bandwidth , 1999 .

[72]  Anoop Gupta,et al.  Interleaving: a multithreading technique targeting multiprocessors and workstations , 1994, ASPLOS VI.

[73]  Eric Rotenberg,et al.  Trace cache: a low latency approach to high bandwidth instruction fetching , 1996, Proceedings of the 29th Annual IEEE/ACM International Symposium on Microarchitecture. MICRO 29.

[74]  K. Olukotun,et al.  Evaluation of Design Alternatives for a Multiprocessor Microprocessor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[75]  Theo Ungerer,et al.  A survey of new research directions in microprocessors , 2000, Microprocess. Microsystems.

[76]  Uwe Brinkschulte,et al.  Real-Time Garbage Collection for a Multithreaded Java Microcontroller , 2004, Real-Time Systems.

[77]  Quinn Jacobson,et al.  Trace processors , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[78]  Bob Boothe,et al.  Improved multithreading techniques for hiding communication latency in multiprocessors , 1992, ISCA '92.

[79]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[80]  Peter Wolcott,et al.  The El'brus-3 and MARS-M: Recent advances in Russian high-performance computing , 2004, The Journal of Supercomputing.

[81]  Mikko H. Lipasti,et al.  Superspeculative Microarchitecture for Beyond AD 2000 , 1997, Computer.

[82]  D. Scott Wills,et al.  On Dynamic Speculative Thread Partitioning and the MEM-Slicing Algorithm , 2000, J. Univers. Comput. Sci..

[83]  Ajay K. Royyuru,et al.  Blue Gene: A vision for protein science using a petaflop supercomputer , 2001, IBM Syst. J..

[84]  Brian N. Bershad,et al.  Execution characteristics of desktop applications on Windows NT , 1998, ISCA.

[85]  Theo Ungerer,et al.  Performance of simultaneous multithreaded multimedia-enhanced processors for MPEG-2 video decompression , 2000, J. Syst. Archit..

[86]  Mario Nemirovsky,et al.  Increasing superscalar performance through multistreaming , 1995, PACT.

[87]  Dean M. Tullsen,et al.  Supporting fine-grained synchronization on a simultaneous multithreading processor , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.

[88]  Gurindar S. Sohi,et al.  Multiscalar processors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[89]  Xin Wang,et al.  Compiler Techniques for Concurrent Multithreading with Hardware Speculation Support , 1996, LCPC.

[90]  Gary S. Tyson,et al.  Limited Dual Path Execution , 2000 .

[91]  Manoj Franklin,et al.  An empirical study of decentralized ILP execution models , 1998, ASPLOS VIII.

[92]  Theo Ungerer,et al.  Context-switching techniques for decoupled multithreaded processors , 1999, Proceedings 25th EUROMICRO Conference. Informatics: Theory and Practice for the New Millennium.

[93]  Dean M. Tullsen,et al.  Simultaneous multithreading: a platform for next-generation processors , 1997, IEEE Micro.

[94]  William J. Dally,et al.  The M-machine multicomputer , 1997, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[95]  Kunle Olukotun,et al.  A Single-Chip Multiprocessor , 1997, Computer.

[96]  Nader Bagherzadeh,et al.  A fine-grain multithreading superscalar architecture , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[97]  Robert H. Halstead,et al.  MASA: a multithreaded processor architecture for parallel symbolic computing , 1988, [1988] The 15th Annual International Symposium on Computer Architecture. Conference Proceedings.

[98]  Kunle Olukotun,et al.  Data speculation support for a chip multiprocessor , 1998, ASPLOS VIII.

[99]  E. Smith,et al.  Selective Dual Path Execution , 1996 .

[100]  Gurindar S. Sohi,et al.  Task selection for a multiscalar processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[101]  Craig Zilles,et al.  Execution-based prediction using speculative slices , 2001, ISCA 2001.

[102]  Jack L. Lo,et al.  Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).

[103]  原田 秀逸 私の computer 環境 , 1998 .

[104]  John Paul Shen,et al.  Efficacy and performance impact of value prediction , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).

[105]  Yale N. Patt,et al.  Simultaneous subordinate microthreading (SSMT) , 1999, ISCA.

[106]  Gurindar S. Sohi,et al.  Microprocessors - 10 Years Back, 10 Years Ahead , 2001, Informatics.

[107]  Arno Formella,et al.  Building the 4 processor SB-PRAM prototype , 1997, Proceedings of the Thirtieth Hawaii International Conference on System Sciences.

[108]  Gurindar S. Sohi,et al.  The use of multithreading for exception handling , 1999, MICRO-32. Proceedings of the 32nd Annual ACM/IEEE International Symposium on Microarchitecture.

[109]  A Unger,et al.  Compiler Supported Speculative Execution on Smt Processors , .

[110]  Joel S. Emer,et al.  Memory dependence prediction using store sets , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[111]  Kunle Olukotun,et al.  A Single Chip Multiprocessor Integrated with DRAM , 1997 .

[112]  Jürgen Niehaus,et al.  MSparc: Multithreading in Real-Time Architectures , 2000, J. Univers. Comput. Sci..

[113]  Gurindar S. Sohi,et al.  Speculative versioning cache , 1998, Proceedings 1998 Fourth International Symposium on High-Performance Computer Architecture.

[114]  Mauricio J. Serrano,et al.  A Model for Performance Estimation in a Multistreamed Superscalar Processor , 1994, Computer Performance Evaluation.

[115]  Gurindar S. Sohi,et al.  Speculative Multithreaded Processors , 2001, Computer.

[116]  Dirk Grunwald,et al.  Selective eager execution on the PolyPath architecture , 1998, ISCA.