Data broadcasting in linearly scheduled array processors

A major problem in executing algorithms in array processors is the implementation of broadcasts without unnecessary speed-up factor degradation. We discuss when and how broadcasts can be eliminated or reduced to easily implementable sequences of reduced local broadcasts. Algorithms are modelled as a structured set of indexed computations which operate on variables associated with a referencing or indexing function. The discussion is restricted to variables with linear indexing functions and to algorithms linearly scheduled for execution in array processors. Linear indexing functions are represented as affine matricial functions of the index set of the algorithm. The linear part of such representation is a coefficient matrix denoted the indexing matrix. Linear schedules are defined as linear time-space allocation functions mapping the computations of an algorithm into time and processors. We discuss necessary and sufficient conditions for the occurrence of broadcasts in a linearly scheduled algorithm. Necessary and sufficient conditions and constructive criteria are given for selecting linear schedules for which all broadcasts are eliminated or reduced to sequences of small local broadcasts.

[1]  Sartaj Sahni,et al.  Data broadcasting in SIMD computers , 1981, IEEE Transactions on Computers.

[2]  D.I. Moldovan,et al.  On the design of algorithms for VLSI systolic arrays , 1983, Proceedings of the IEEE.

[3]  Robert Henry Kuhn,et al.  Optimization and interconnection complexity for: parallel processors, single-stage networks, and decision trees , 1980 .

[4]  D. V. Bhaskar Rao,et al.  Wavefront Array Processor: Language, Architecture, and Applications , 1982, IEEE Transactions on Computers.

[5]  David A. Padua,et al.  Dependence graphs and compiler optimizations , 1981, POPL '81.

[6]  H. T. Kung Let's Design Algorithms for VLSI Systems , 1979 .

[7]  David L. Kuck,et al.  The Structure of Computers and Computations , 1978 .

[8]  Jose Antonio Baptista Fortes Algorithm transformations for parallel processing and vlsi architecture design , 1984 .

[9]  Leslie Lamport,et al.  The parallel execution of DO loops , 1974, CACM.

[10]  Charles E. Leiserson,et al.  Optimizing synchronous systems , 1981, 22nd Annual Symposium on Foundations of Computer Science (sfcs 1981).

[11]  Kenneth E. Batcher,et al.  Design of a Massively Parallel Processor , 1980, IEEE Transactions on Computers.

[12]  Lawrence Snyder,et al.  Introduction to the configurable, highly parallel computer , 1982, Computer.

[13]  Kai Hwang,et al.  Partitioned Matrix Algorithms for VLSI Arithmetic Systems , 1982, IEEE Trans. Computers.

[14]  Dan I. Moldovan,et al.  On the Analysis and Synthesis of VLSI Algorithms , 1982, IEEE Transactions on Computers.