Dynamic Barrier Architecture for Multi-Mode Fine-Grain Parallelism Using Conventional Processors

Parallel computers constructed using conventional processors offer the potential to achieve large improvements in execution speed at reasonable cost, however, these machines tend to efficiently implement only coarse-grain MIMD parallelism. To achieve the best possible speedup through parallel execution, a computer must be capable of effectively using all the different types of parallelism that exist in each program. A combination of SIMD, VLIW, and MIMD parallelism, at a variety of granularity levels, exists in most applications; thus, hardware that can support multiple types of parallelism can achieve better performance with a wider range of codes. In this paper, we introduce a new hardware barrier architecture that provides the full DBM functionality we discussed in [11], but can be implemented with much simpler hardware. This mechanism can be used to efficiently support multi-mode moderate-width parallelism with instruction-level granularity (i.e., synchronization cost is approximately one LOAD instruction).

[1]  M. Auguin,et al.  The OPSILA computer , 1986 .

[2]  Rajiv Gupta The fuzzy barrier: a mechanism for high speed synchronization of processors , 1989, ASPLOS III.

[3]  H. F. Jordan A Special Purpose Architecture for Finite Element Analysis , 1978 .

[4]  Michael Philippsen,et al.  Project Triton: towards improved programmability of parallel machines , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.

[5]  Howard Jay Siegel,et al.  Instruction execution trade-offs for SIMD vs. MIMD vs. mixed mode parallelism , 1991, [1991] Proceedings. The Fifth International Parallel Processing Symposium.

[6]  Henry G. Dietz,et al.  PAPERS: Purdue's Adapter for Parallel Execution and Rapid synchronization , 1994 .

[7]  Daniel W. Watson Compile-time selection of parallel modes in an SIMD/SPMD heterogeneous parallel environment , 1993 .

[8]  Robert P. Colwell,et al.  A VLIW architecture for a trace scheduling compiler , 1987, ASPLOS.

[9]  Henry G. Dietz,et al.  Hardware Barrier Synchronization: Static Barrier MIMD (SBM) , 1990, ICPP.

[10]  Harry F. Jordan,et al.  Comparing barrier algorithms , 1989, Parallel Comput..

[11]  R. Sarnath,et al.  Proceedings of the International Conference on Parallel Processing , 1992 .

[12]  G. H. Barnes,et al.  A controllable MIMD architecture , 1986 .

[13]  Howard Jay Siegel,et al.  Data Management and Control-Flow Aspects of an SIMD/SPMD Parallel Language/Compiler , 1993, IEEE Trans. Parallel Distributed Syst..

[14]  Ronan Keryell,et al.  Activity Counter: New Optimization for the dynamic scheduling of SIMD Control Flow , 1993, 1993 International Conference on Parallel Processing - ICPP'93.

[15]  Henry G. Dietz,et al.  Static synchronization beyond VLIW , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[16]  Constantine D. Polychronopoulos Compiler Optimizations for Enhancing Parallelism and Their Impact on Architecture Design , 1988, IEEE Trans. Computers.

[17]  STEPHEN F. LUNDSTROM,et al.  Applications Considerations in the System Design of Highly Concurrent Multiprocessors , 1987, IEEE Transactions on Computers.

[18]  Alexandru Nicolau,et al.  Percolation scheduling for non-VLIW machines , 1990 .

[19]  John R. Ellis,et al.  Bulldog: A Compiler for VLIW Architectures , 1986 .

[20]  Howard Jay Siegel,et al.  Data management and control-flow constructs in a SIMD/SPMD parallel language/compiler , 1990, [1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation.

[21]  Robert P. Colwell,et al.  A VLIW architecture for a trace scheduling compiler , 1987, ASPLOS 1987.

[22]  Hiroki Honda,et al.  A Multi-Grain Parallelizing Compilation Scheme for OSCAR (Optimally Scheduled Advanced Multiprocessor) , 1991, LCPC.