SAGA & CONDENSE: A TWO-PHASE APPROACH FOR THE IMPLEMENTATION OF RECURRENCE EQUATIONS

Our goal is to automate the parallel implementation of algorithms, specifically those arising in scientific and engineering applications. Since many of these computations may be formulated as coupled systems of recurrence equations, they exhibit a high degree of repetitiveness and regularity. We exploit these properties to gen- erate efficient, and in certain cases optimal, schedules and processor assignments on particular multiprocessor configurations. The process of determining a parallel implementation consists of two successive stages, named SAC4 -td CONDENSE. From a representation of the recurrence equatl , asic characteristics of the target architecture and a hierarchy of obj. tives, which lists in order of priority properties desired of the implementation, SAGA generates a minimal-time systolic implementation whose number of processors is likely to be problem dependent. If the problem size exceeds the size of the target architecture, CONDENSE groups the processors in the array given by SAGA so that each group is assigned, or condensed, to one processor of the target architecture. The condensation is done so as to minimize the total run time, and is guided by the systolic array from SAGA. The application of SAGA and CONDENSE is to generate an- notations for a functional language compiler on a homogeneous net- work of processors with local memory, like the CMU WARP or IN- TEL iPSC. This paper provides the mathematical framework and an overview of the main ideas for the dataflow analysis.

[1]  H. T. Kung,et al.  Warp architecture and implementation , 1998, ISCA '98.

[2]  Charles L. Seitz,et al.  Design of the Mosaic Element , 1983 .

[3]  David A. Padua,et al.  Dependence graphs and compiler optimizations , 1981, POPL '81.

[4]  Jean-Loup Baer,et al.  Computer systems architecture , 1980 .

[5]  Charles E. McDowell,et al.  Processor Scheduling for Linearly Connected Parallel Processors , 1986, IEEE Transactions on Computers.

[6]  Jean-Marc Delosme,et al.  Highly concurrent computing structures for matrix arithmetic and signal processing , 1982, Computer.

[7]  H. T. Kung,et al.  A systolic array computer , 1985, ICASSP '85. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[8]  Dan I. Moldovan,et al.  On the Analysis and Synthesis of VLSI Algorithms , 1982, IEEE Transactions on Computers.

[9]  Charles E. Leiserson,et al.  Area-Efficient VLSI Computation , 1983 .

[10]  Robert Schreiber,et al.  Systolic Arrays For Eigenvalue Computation , 1982, Other Conferences.

[11]  Charles L. Seitz,et al.  The cosmic cube , 1985, CACM.

[12]  Ehud Shapiro Systolic Programming: A Paradigm of Parallel Processing , 1984, FGCS.

[13]  H. T. Kung,et al.  Systolic Arrays for (VLSI). , 1978 .