Abstract Many iterative or recursive applications commonly found inDSP and image processing applications can be represented bydata-flow graphs (DFGs). A great deal of research has beendone attempting to optimize such applications by applying vari-ous graph transformation techniques to the DFG in order to mini-mize the schedule length. One of the most effective of these tech-niques is retiming. In this paper, we demonstrate that the tradi-tional retiming technique does not always achieve optimal sched-ules (although it can be used in combination with other techniquesto do so) and propose a new graph-transformation technique,ex-tended retiming, which will.Index terms: Scheduling, Data-flow Graphs, Retiming, GraphTransformation, Timing Optimization 1 Introduction Many iterative or recursive applications, such as image pro-cessing, DSP and PDE simulations, can be represented by data-flow graphs , or DFGs [4]. The nodes of a DFG represent tasks,while edges between nodes represent data dependencies amongthe tasks, either within iterations (an execution of all tasks) or be-tween iterations. To model repeated steps within an algorithm, aDFG may contain loops. To meet the desired throughput, it be-comes necessary to use multiple processors or multiple functionalunits. Due to the expense of such units, it is important for us tominimize the number of processors we involve during execution,while maximizing the use of those processors that we do include.The process of assigning a starting time and processor to eachevent in the DFG, known as scheduling, becomes a vital step inthis process.There are two common approaches for system-level synthesisand scheduling of parallel systems:1. We can explicitly schedule the DFG as-is.2. We can first apply a graph transformation technique to theDFG in order to maximize the degree of parallelism, thenschedule the acyclic (or DAG) part of the resulting graph.There are many methods for doing scheduling [2,6,8]; hence thefocus of our study will be the optimization of the DFG via graphtransformation. We will later show that the second of these twomethods is preferable to the first because the schedule it pro ducesrequires fewer resources.The execution of all tasks of a DFG is called an iteration,with the length of time it takes to complete an iteration called theschedule length of the DFG. While there are many graph trans-formation techniques available to us, it is possible to find g raphsfor which the current techniques will not produce a transformedDFG having minimum schedule length. We will demonstrate thatin this paper, as well as propose a new transformation techniquewhich does deliver optimal results. When compared with the tra-ditional methods, our new technique quickly and easily producesa transformed graph without increasing the size of the DFG.A great deal of research has been done attempting to optimizethe schedule of tasks for an application after applying variousgraph transformation techniques to the application’s DFG. Oneof the more effective of these techniques is retiming [1,7], wheredelays are redistributed among the edges so that the application’sfunction remains the same, but the length of the longest zero-delay path, called the clock period of the DFG G and denotedcl
[1]
Keshab K. Parhi,et al.
Static Rate-Optimal Scheduling of Iterative Data-Flow Programs via Optimum Unfolding
,
1991,
IEEE Trans. Computers.
[2]
Markku Renfors,et al.
The maximum sampling rate of digital filters under hardware speed constraints
,
1981
.
[3]
Keshab K. Parhi,et al.
High-level DSP synthesis using concurrent transformations, scheduling, and allocation
,
1995,
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..
[4]
Rajesh K. Gupta,et al.
Faster maximum and minimum mean cycle algorithms for system-performance analysis
,
1998,
IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..
[5]
Giovanni De Micheli,et al.
Synthesis and Optimization of Digital Circuits
,
1994
.
[6]
Edwin Hsing-Mean Sha,et al.
Static scheduling for synthesis of DSP algorithms on various models
,
1995,
J. VLSI Signal Process..
[7]
Frédéric Vivien,et al.
Combining Retiming and Scheduling Techniques for Loop Parallelization and Loop Tiling
,
1997,
Parallel Process. Lett..
[8]
Yves Robert,et al.
Circuit Retiming Applied to Decomposed Software Pipelining
,
1998,
IEEE Trans. Parallel Distributed Syst..
[9]
Edwin Hsing-Mean Sha,et al.
Scheduling Data-Flow Graphs via Retiming and Unfolding
,
1997,
IEEE Trans. Parallel Distributed Syst..
[10]
Charles E. Leiserson,et al.
Retiming synchronous circuitry
,
1988,
Algorithmica.
[11]
Edwin Hsing-Mean Sha,et al.
Rotation Scheduling: A Loop Pipelining Algorithm
,
1993,
30th ACM/IEEE Design Automation Conference.
[12]
S. Tongsima,et al.
Communication-sensitive loop scheduling for DSP applications
,
1997,
IEEE Trans. Signal Process..
[13]
Edwin Hsing-Mean Sha,et al.
Loop Pipelining for Scheduling Multi-Dimensional Systems via Rotation
,
1994,
31st Design Automation Conference.
[14]
Keshab K. Parhi,et al.
Resource-constrained loop list scheduler for DSP algorithms
,
1995,
J. VLSI Signal Process..