Hardware Support for Multithreaded Execution of Loops with Limited Parallelism

Loop scheduling has significant differences in multithreaded from other parallel processors. The sharing of hardware resources imposes new scheduling limitations, but it also allows a faster communication across threads. We present a multithreaded processor model, Coral 2000, with hardware extensions that support Macro Software Pipelining, a loop scheduling technique for multithreaded processors. We tested and evaluated Coral 2000 on a cycle-level simulator, using synthetic and integer SPEC benchmarks. We obtained speedups of up to 30% with respect to highly optimized superblock-based schedules on loops that exhibit limited parallelism.

[1]  H. T. Kung,et al.  Supporting systolic and memory communication in iWarp , 1990, ISCA '90.

[2]  Josep Torrellas,et al.  A clustered approach to multithreaded processors , 1998, Proceedings of the First Merged International Parallel Processing Symposium and Symposium on Parallel and Distributed Processing.

[3]  Scott A. Mahlke,et al.  The superblock: An effective technique for VLIW and superscalar compilation , 1993, The Journal of Supercomputing.

[4]  Donald Yeung,et al.  Sparcle: an evolutionary processor design for large-scale multiprocessors , 1993, IEEE Micro.

[5]  Dean M. Tullsen,et al.  Simultaneous multithreading: Maximizing on-chip parallelism , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.

[6]  Utpal Banerjee,et al.  Dependence analysis for supercomputing , 1988, The Kluwer international series in engineering and computer science.

[7]  B. Ramakrishna Rau,et al.  Some scheduling techniques and an easily schedulable horizontal architecture for high performance scientific computing , 1981, MICRO 14.

[8]  Daniel M. Lavery,et al.  Modulo Scheduling for Control-Intensive General-Purpose Programs , 1997 .

[9]  Milind Girkar Functional parallelism: theoretical foundations and implementation , 1992 .

[10]  B J Smith,et al.  A pipelined, shared resource MIMD computer , 1986 .

[11]  Robert A. Iannucci,et al.  Editors: Multithreaded computer architecture : A summary of the state of the art , 1994 .

[12]  Jenn-Yuan Tsai,et al.  The superthreaded architecture: thread pipelining with run-time data dependence checking and control speculation , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[13]  Brad Calder,et al.  Threaded multiple path execution , 1998, Proceedings. 25th Annual International Symposium on Computer Architecture (Cat. No.98CB36235).

[14]  G. Dimitriou,et al.  Loop Scheduling for Multithreaded Processors , 2004 .

[15]  Haitham Akkary,et al.  A dynamic multithreading processor , 1998, Proceedings. 31st Annual ACM/IEEE International Symposium on Microarchitecture.

[16]  David A. Padua,et al.  High-Speed Multiprocessors and Compilation Techniques , 1980, IEEE Transactions on Computers.

[17]  Donald Yeung,et al.  Low-Cost Support for Fine-Grain Synchronization in Multiprocessors , 1992, Multithreaded Computer Architecture.

[18]  Keshav Pingali,et al.  I-structures: data structures for parallel computing , 1986, Graph Reduction.

[19]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[20]  David A. Padua,et al.  Advanced compiler optimizations for supercomputers , 1986, CACM.

[21]  James P. Laudon,et al.  Architectural and Implementation Tradeoffs for Multiple-Context Processors , 1995 .

[22]  David E. Culler,et al.  The Explicit Token Store , 1990, J. Parallel Distributed Comput..

[23]  B. Ramakrishna Rau,et al.  Instruction-level parallel processing: History, overview, and perspective , 2005, The Journal of Supercomputing.

[24]  Antonio González,et al.  Speculative multithreaded processors , 1998, ICS '98.

[25]  Josep Torrellas,et al.  Removing architectural bottlenecks to the scalability of speculative parallelization , 2001, Proceedings 28th Annual International Symposium on Computer Architecture.

[26]  Antonia Zhai,et al.  Compiler optimization of scalar value communication between speculative threads , 2002, ASPLOS X.

[27]  H. T. Kung Deadlock avoidance for systolic communication , 1988, ISCA 1988.

[28]  Roger A. Bringmann Enhancing instruction level parallelism through compiler-controlled speculation , 1995 .

[29]  Kevin O'Brien,et al.  Single-program speculative multithreading (SPSM) architecture: compiler-assisted fine-grained multithreading , 1995, PACT.