论文信息 - Synchronizing Large Systolic Arrays

Synchronizing Large Systolic Arrays

Parallel computing structures consist of many processors operating simultaneously. If a concurrent structure is regular, as in the case of a systolic array. it may be convenient to think of all processors as operating in lock step. This synchronized view, for example, often makes the definition of the structure and its correctness relatively easy to follow. However, large, totally synchronized systems controlled by central clocks are difficult to implement because of the inevitable problem of clock skews and delays. An alternative means of enforcing necessary synchronization is the use of self-timed, asynchronous schemes, at the cost of increased design complexity and hardware cost. Realizing that different circumstances call for different synchronization methods, this paper provides a spectrum of synchronization models; based on the assumptions made for each model, theoretical lower bounds on clock skew are derived, and appropriate or best-possible synchronization schemes for systolic arrays are proposed. In general, this paper represents a first step towards a systematic study of synchronization problems for large systolic arrays. One set of models is based on assumptions that allow the use of a pipelined clocking scheme, where more than one clock event is propagated at a time. In this case, it is shown that even assuming that physical variations along clock lines can produce skews between wires of the same length, any one-dimensional systolic array can be correctly synchronized by a global pipelined clock while enjoying desirable properties such as modularity, expandability and robustness in the synchronization scheme. This result cannot be extended to two-dimensional arrays, however--the paper shows that under this assumption, it is impossible to run a clock such that the maximum clock skew between two communicating cells will be bounded by a constant as systems grow. For such cases or where pipelined clocking is unworkable, a synchronization scheme incorporating both clocked and "asynchronous" elements is proposed.

H. T. Kung | Allan L. Fisher

[1] M. Rem,et al. Cost and performance of VLSI computing structures , 1979, IEEE Transactions on Electron Devices.

[2] Lawrence Snyder,et al. Bounds on minimax edge length for complete binary trees , 1981, STOC '81.

[3] C. Thomborson,et al. A Complexity Theory for VLSI , 1980 .

[4] Charles L. Seitz,et al. Self-Timed VLSI Systems , 1979 .

[5] Arnold L. Rosenberg,et al. On Embedding Rectangular Grids in Square Grids , 1982, IEEE Transactions on Computers.

[6] Richard J. Lipton,et al. Space and Time Hierarchies for Classes of Control Structures and Data Structures , 1976, JACM.

[7] H. T. Kung,et al. A tree machine for searching problems , 1979 .

[8] H. T. Kung. Why systolic architectures? , 1982, Computer.