Scheduling and behavioral transformation for parallel systems

How to efficiently and optimally schedule a program with loops is an important problem in VLSI high level synthesis and compilers for parallel machines. For a signal flow graph of a DSP filter, or a data flow graph of a behavioral description, we would like to know how to transform this graph such that a final synthesized program or hardware is possible to achieve the highest pipeline rate. A new technique is designed by combining two transformation techniques, retiming and unfolding (or called unrolling), to obtain effective static schedules. Many fundamental properties of loop scheduling are derived through this combination. This new technique turns out to be very useful, and can be generalized to other problems. For example, the problem of software pipelining in parallel compilers is modeled as a special case of our technique. Efficient polynomial-time algorithms are derived for the scheduling on different parallel models and implementation styles. For uniform nested loops, multi-dimensional retiming and unfolding are defined and studied for nested loop pipelining. Based on the theoretical results, a novel technique, rotation, is designed for loop pipelining under resource constraints. Rotation technique repeatedly transforms a schedule to a more compact schedule. The rotation scheduling gives the currently best performance from experiments on benchmarks.

[1]  Donald B. Johnson,et al.  Efficient Algorithms for Shortest Paths in Sparse Networks , 1977, J. ACM.

[2]  Donald A. Lobo,et al.  Redundant operator creation: a scheduling optimization technique , 1991, 28th ACM/IEEE Design Automation Conference.

[3]  Keshab K. Parhi,et al.  Static Rate-Optimal Scheduling of Iterative Data-Flow Programs via Optimum Unfolding , 1991, IEEE Trans. Computers.

[4]  Robert E. Tarjan,et al.  Data structures and network algorithms , 1983, CBMS-NSF regional conference series in applied mathematics.

[5]  Charles E. Leiserson,et al.  Optimizing synchronous systems , 1981, 22nd Annual Symposium on Foundations of Computer Science (sfcs 1981).

[6]  Miodrag Potkonjak Algorithms for high-level synthesis: resource utilization-based approach , 1992 .

[7]  Alice C. Parker,et al.  Sehwa: a software package for synthesis of pipelines from behavioral specifications , 1988, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[8]  Yu-Chin Hsu,et al.  A formal approach to the scheduling problem in high level synthesis , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[9]  Alok Sharma,et al.  Empirical evaluation of some high-level synthesis scheduling heuristics , 1991, 28th ACM/IEEE Design Automation Conference.

[10]  Roland R. Mielke,et al.  Strategies for predictability in real-time data-flow architectures , 1990, [1990] Proceedings 11th Real-Time Systems Symposium.

[11]  Gjalt G. de Jong Data flow graphs: system specification with the most unrestricted semantics , 1991, EURO-DAC.

[12]  Markku Renfors,et al.  The maximum sampling rate of digital filters under hardware speed constraints , 1981 .

[13]  Raul Camposano,et al.  Path-based scheduling for synthesis , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[14]  Alexander Aiken,et al.  Optimal loop parallelization , 1988, PLDI '88.

[15]  Edwin Hsing-Mean Sha,et al.  Retiming and Unfolding Data-Flow Graphs , 1992, ICPP.

[16]  S. Kung,et al.  VLSI Array processors , 1985, IEEE ASSP Magazine.

[17]  Robert K. Brayton,et al.  Retiming and resynthesis: optimizing sequential networks with combinational techniques , 1991, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[18]  Giovanni De Micheli,et al.  The Olympus synthesis system , 1990, IEEE Design & Test of Computers.

[19]  Neal M. Gafter,et al.  Experiments with the Hi-PASS DSP synthesis system , 1992, [Proceedings] 1992 IEEE International Symposium on Circuits and Systems.

[20]  A. J. Clewett,et al.  Introduction to sequencing and scheduling , 1974 .

[21]  J A Fisher,et al.  Instruction-Level Parallel Processing , 1991, Science.

[22]  V. K. Raj DAGAR: an automatic pipelined microarchitecture synthesis system , 1989, Proceedings 1989 IEEE International Conference on Computer Design: VLSI in Computers and Processors.

[23]  Dharma P. Agrawal,et al.  Scheduling pipelined communication in distributed memory multiprocessors for real-time applications , 1991, [1991] Proceedings. The 18th Annual International Symposium on Computer Architecture.

[24]  Shridhar B. Shukla,et al.  A Compile-Time Technique for Contorlling Real-Time Execution of Task-Level Data-Flow Graphs , 1992, ICPP.

[25]  William L. Maxwell,et al.  Theory of scheduling , 1967 .

[26]  Edwin Hsing-Mean Sha,et al.  Static scheduling of uniform nested loops , 1993, [1993] Proceedings Seventh International Parallel Processing Symposium.

[27]  E. F. Girczyc,et al.  Loop winding--a data flow approach to functional pipelining , 1987 .

[28]  Keshab K. Parhi,et al.  Rate-optimal fully-static multiprocessor scheduling of data-flow signal processing programs , 1989, IEEE International Symposium on Circuits and Systems,.

[29]  Alexandru Nicolau,et al.  Incremental tree height reduction for high level synthesis , 1991, 28th ACM/IEEE Design Automation Conference.

[30]  Edith Cohen,et al.  Strongly polynomial-time and NC algorithms for detecting cycles in dynamic graphs , 1989, STOC '89.

[31]  Keshab K. Parhi,et al.  High level DSP synthesis using the MARS design system , 1992, [Proceedings] 1992 IEEE International Symposium on Circuits and Systems.

[32]  Daniel Gajski,et al.  Flow Graph Representation , 1986, 23rd ACM/IEEE Design Automation Conference.

[33]  A. Aiken,et al.  Loop Quantization: an Analysis and Algorithm , 1987 .

[34]  Donald E. Thomas,et al.  A Method of Automatic Data Path Synthesis , 1983, 20th Design Automation Conference Proceedings.

[35]  Liang-Gee Chen,et al.  A globally static rate optimal scheduling for recursive DSP algorithms , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[36]  Edwin H.-M. Sha,et al.  Unfolding and retiming data-flow DSP programs for RISC multiprocessor scheduling , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[37]  Yu Hen Hu,et al.  Optimal scheduling of linear recurrence equations on a multiprocessor array , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[38]  Yu Hen Hu,et al.  Fully static multiprocessor realization for real-time recursive DSP algorithms , 1992, [1992] Proceedings of the International Conference on Application Specific Array Processors.

[39]  Pierre G. Paulin,et al.  Force-directed scheduling for the behavioral synthesis of ASICs , 1989, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[40]  J. Orlin Some problems on dynamic/periodic graphs , 1983 .

[41]  Kazuo Iwano,et al.  An Efficient Algorithm for Optimal Loop Parallelization , 1990, SIGAL International Symposium on Algorithms.

[42]  Alice C. Parker,et al.  The high-level synthesis of digital systems , 1990, Proc. IEEE.

[43]  A. Sangiovanni-Vincentelli,et al.  Retiming and resynthesis: optimizing sequential networks with combinational techniques , 1990, Twenty-Third Annual Hawaii International Conference on System Sciences.

[44]  Kazuo Iwano Two-dimensional dynamic graphs and their vlsi applications , 1987 .

[45]  Richard M. Karp,et al.  A characterization of the minimum cycle mean in a digraph , 1978, Discret. Math..

[46]  Edward A. Lee,et al.  Static Scheduling of Synchronous Data Flow Programs for Digital Signal Processing , 1989, IEEE Transactions on Computers.

[47]  Gregory F. Sullivan,et al.  Detecting cycles in dynamic graphs in polynomial time , 1988, STOC '88.

[48]  Miodrag Potkonjak,et al.  Performance optimization of sequential circuits by eliminating retiming bottlenecks , 1992, ICCAD.

[49]  Guang R. Gao,et al.  A timed Petri-net model for fine-grain loop scheduling , 1991, PLDI '91.

[50]  Joseph A. Fisher,et al.  Trace Scheduling: A Technique for Global Microcode Compaction , 1981, IEEE Transactions on Computers.

[51]  Daniel Gajski,et al.  An effective methodology for functional pipelining , 1992, ICCAD.

[52]  Bruce D. Shriver,et al.  Some Experiments in Local Microcode Compaction for Horizontal Machines , 1981, IEEE Transactions on Computers.

[53]  Edwin Hsing-Mean Sha,et al.  Rotation Scheduling: A Loop Pipelining Algorithm , 1993, 30th ACM/IEEE Design Automation Conference.

[54]  Keshab K. Parhi,et al.  Algorithm transformation techniques for concurrent processors , 1989, Proc. IEEE.

[55]  Kenneth Steiglitz,et al.  Testing for cycles in infinite graphs with periodic structure , 1987, STOC.

[56]  David S. Johnson,et al.  Computers and Intractability: A Guide to the Theory of NP-Completeness , 1978 .

[57]  Daniel P. Siewiorek,et al.  Automated Synthesis of Data Paths in Digital Systems , 1986, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[58]  Joos Vandewalle,et al.  Loop Optimization in Register-Transfer Scheduling for DSP-Systems , 1989, 26th ACM/IEEE Design Automation Conference.