Transformation of broadcasting into pipelining. Research report

A characteristic shared by many computation-intensive algorithms is the repeated usage of a few data values in a sequence of computations. An efficient parallel implementation of these data dependences often requires the simultaneous transfer, or broadcasting, of the data values to all the processors that need them. Unfortunately, direct realization of this broadcasting operation on VLSI processor arrays, especially on systolic arrays, usually results in severe performance degradation. A technique for decomposing broadcasting dependences into propagation dependences at the algorithm level is presented in this paper. Such propagation dependences, when physically realized, result in pipelining. The determination of a feasible propagation scheme is formulated as a linear algebra problem. It is proven that all broadcastings can be decomposed into propagations, and a systematic method for finding such decompositions is proposed.