Partitioning of processor arrays: a piecewise regular approach

Abstract The paper describes the systematic design of processor arrays with a given dimension and given number of processing elements. This problem is called partitioning. A solution to the partitioning problem is described for mapping a class of algorithms with piecewise regular dependence graphs, i.e. piecewise regular algorithms, onto processor arrays. These arrays are also piecewise regular, i.e. they are composed of a number of regularly connected homogenous subarrays. Partitioning deals with the division of the dependence graph of a piecewise regular algorithm into tiles and the scheduling of corresponding operations on a processor array of fixed size and dimension. Different solutions to this problem are termed partitioning schemes. Partitioning schemes may be classified into projection, multiprojection, passive and active clustering. The hereafter presented unified approach to the solution of the partitioning problem is based on the following concepts: (1) Algorithms are represented by programs. These programs can be directly interpreted as a description of hardware. (2) The concept of stepwise refinement of programs is used to solve the partitioning problem by applying a sequence of provably correct program transformations. The transformations basically involve operations on index sets. Two program transformations are introduced: (a) The EXPAND program transformation partitions the iteration space of a given program into a direct sum of lattices. The dimension of the iteration space increases. In contrary to other approaches, also nonperfect tilings may be considered. (b) Operations are scheduled on a processor array of fixed size and dimension using the REDUCE transformation. The dimension of the iteration space and thereby the dimension of the processor array is reduced. The parameters of this program transformation enable the realization of the different partitioning schemes. (3) The whole solution is embedded in the concepts of a systematic design of processor arrays.

[1]  Rami G. Melhem,et al.  Synthesizing Non-Uniform Systolic Designs , 1986, ICPP.

[2]  D.I. Moldovan,et al.  On the design of algorithms for VLSI systolic arrays , 1983, Proceedings of the IEEE.

[3]  K. Mani Chandy Parallel program design , 1989 .

[4]  Kai Hwang,et al.  Partitioned algorithms and VLSI structures for large-scale matrix computations , 1981, 1981 IEEE 5th Symposium on Computer Arithmetic (ARITH).

[5]  Mateo Valero,et al.  Partitioning: An Essential Step in Mapping Algorithms Into Systolic Array Processors , 1987, Computer.

[6]  H. T. Kung,et al.  Systolic Arrays for (VLSI). , 1978 .

[7]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[8]  P. Bertolazzi,et al.  A systematic approach to the design of modular systolic arrays , 1988, [1988] Proceedings. International Conference on Systolic Arrays.

[9]  L. Johnsson A Computational Array for the QR-Method , 1982 .

[10]  K. Jainandunsing,et al.  Parallel algorithms for solving systems of linear equations and their mapping on systolic arrays , 1989 .

[11]  Lothar Thiele,et al.  On the hierarchical design of VLSI processor arrays , 1988, 1988., IEEE International Symposium on Circuits and Systems.

[12]  François Irigoin,et al.  Supernode partitioning , 1988, POPL '88.

[13]  E. Deprettere,et al.  Automatic design and partitioning of systolic/wavefront arrays for VLSI , 1988 .

[14]  Thomas Kailath,et al.  Regular iterative algorithms and their implementation on processor arrays , 1988, Proc. IEEE.

[15]  Jang-Ping Sheu,et al.  Partitioning and Mapping Nested Loops on Multiprocessor Systems , 1991, IEEE Trans. Parallel Distributed Syst..

[16]  Jürgen Teich,et al.  Control generation in the design of processor arrays , 1991, J. VLSI Signal Process..

[17]  Yves Robert,et al.  Synthesizing systolic algorithms: some recent developments , 1991 .

[18]  Takao Nishitani,et al.  A real-time HDTV signal processor: HD-VSP-system and applications , 1990, IEEE International Conference on Communications, Including Supercomm Technical Sessions.

[19]  Lothar Thiele,et al.  On the design of piecewise regular processor arrays , 1989, IEEE International Symposium on Circuits and Systems,.

[20]  J. Bu,et al.  Systematic design of regular VLSI processor arrays , 1990 .

[21]  Satoshi Horiike,et al.  A design method of systolic arrays under the constraint of the number of the processors , 1987, ICASSP '87. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[22]  Bjarne Stroustrup,et al.  C++ Programming Language , 1986, IEEE Softw..

[23]  D J Evans,et al.  Parallel processing , 1986 .

[24]  Young-il Choo,et al.  Parallel-program transformation using a metalanguage , 1991, PPOPP '91.

[25]  Dan I. Moldovan,et al.  Partitioning and Mapping Algorithms into Fixed Size Systolic Arrays , 1986, IEEE Transactions on Computers.

[26]  Uwe Schwiegelshohn,et al.  Linear Systolic Arrays for Matrix Comutations , 1989, J. Parallel Distributed Comput..

[27]  Lothar Thiele,et al.  Uniform design of parallel programs for DSP , 1991, 1991., IEEE International Sympoisum on Circuits and Systems.

[28]  Laurence A. Wolsey,et al.  Integer and Combinatorial Optimization , 1988 .