This paper considers hardware support for the exploitation of control parallelism on data parallel architectures. It is well known that data parallel algorithms may also possess control parallel structure. However the splitting of control leads to data dependency and synchronization issues that were implicitly handled in conventional SIMD architectures. These include synchronization of access to scalar and parallel variables, and synchronization for parallel communication operations. We propose a sharing mechanism for scalar variables and identify a strategy which allows synchronization of scalar variables between multiple streams. The techniques considered are based on a bit-interleaved register file structure which allows fast copy between register sets. Hardware cost estimates and timing analyses are provided, and comparison with an alternate scheme is presented. The register file structure has been designed and simulated for the HP 0.8 /spl mu/m CMOS process, and circuit simulation indicates that access times are less than six nanoseconds. In addition, the impact of this structure on system performance is also studied.<<ETX>>
[1]
M. Auguin,et al.
The OPSILA computer
,
1986
.
[2]
T. Bridges.
The GPA machine: a generally partitionable MSIMD architecture
,
1990,
[1990 Proceedings] The Third Symposium on the Frontiers of Massively Parallel Computation.
[3]
D. E. Schimmel.
Superscalar SIMD architecture
,
1992,
[Proceedings 1992] The Fourth Symposium on the Frontiers of Massively Parallel Computation.
[4]
H. G. Dietz,et al.
A Massively Parallel MIMD Implemented by SIMD Hardware
,
1992
.
[5]
Allen R. Hanson,et al.
Image understanding architecture: exploiting potential parallelism in machine vision
,
1992,
Computer.