Cost sensitive modulo scheduling in a loop accelerator synthesis system

Scheduling algorithms used in compilers traditionally focus on goals such as reducing schedule length and register pressure or producing compact code. In the context of a hardware synthesis system where the schedule is used to determine various components of the hardware, including datapath, storage, and interconnect, the goals of a scheduler change drastically. In addition to achieving the traditional goals, the scheduler must proactively make decisions to ensure efficient hardware is produced. This paper proposes two exact solutions for cost sensitive modulo scheduling, one based on an integer linear programming formulation and another based on branch-and-bound search. To achieve reasonable compilation times, decomposition techniques to break down the complex scheduling problem into phase ordered sub-problems are proposed. The decomposition techniques work either by partitioning the dataflow graph into smaller subgraphs and optimally scheduling the subgraphs, or by splitting the scheduling problem into two phases, time slot and resource assignment. The effectiveness of cost sensitive modulo scheduling in minimizing the costs of function units, register structures, and interconnection wires are evaluated within a fully automatic synthesis system for loop accelerators. The cost sensitive modulo scheduler increases the efficiency of the resulting hardware significantly compared to both traditional cost unaware and greedy cost aware modulo schedulers

[1]  Pierre G. Paulin,et al.  Force-directed scheduling for the behavioral synthesis of ASICs , 1989, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[2]  Youn-Long Lin,et al.  A new integer linear programming formulation for the scheduling problem in data path synthesis , 1989, 1989 IEEE International Conference on Computer-Aided Design. Digest of Technical Papers.

[3]  Maya Gokhale,et al.  Data-parallel C on a reconfigurable logic array , 2005, The Journal of Supercomputing.

[4]  A. Smith,et al.  PRISM-II compiler and architecture , 1993, [1993] Proceedings IEEE Workshop on FPGAs for Custom Computing Machines.

[5]  Alexandre E. Eichenberger,et al.  Stage scheduling: a technique to reduce the register requirements of a module schedule , 1995, MICRO 1995.

[6]  Alice C. Parker,et al.  Sehwa: a software package for synthesis of pipelines from behavioral specifications , 1988, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[7]  Guang R. Gao,et al.  Minimizing register requirements under resource-constrained rate-optimal software pipelining , 1994, Proceedings of MICRO-27. The 27th Annual IEEE/ACM International Symposium on Microarchitecture.

[8]  B. Ramakrishna Rau,et al.  Iterative modulo scheduling: an algorithm for software pipelining loops , 1994, MICRO 27.

[9]  Alexandre E. Eichenberger,et al.  Optimum modulo schedules for minimum register requirements , 1995 .

[10]  Scott A. Mahlke,et al.  High-level synthesis of nonprogrammable hardware accelerators , 2000, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors.

[11]  Roozbeh Jafari,et al.  Global resource sharing for synthesis of control data flow graphs on FPGAs , 2003, Proceedings 2003. Design Automation Conference (IEEE Cat. No.03CH37451).

[12]  Alexandre E. Eichenberger,et al.  Stage scheduling: a technique to reduce the register requirements of a modulo schedule , 1995, Proceedings of the 28th Annual International Symposium on Microarchitecture.

[13]  Alok N. Choudhary,et al.  A system for synthesizing optimized FPGA hardware from Matlab(R) , 2001, IEEE/ACM International Conference on Computer Aided Design. ICCAD 2001. IEEE/ACM Digest of Technical Papers (Cat. No.01CH37281).

[14]  Bruce A. Draper,et al.  Cameron: high level language compilation for reconfigurable systems , 1999, 1999 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.PR00425).

[15]  Csaba Andras Moritz,et al.  Parallelizing applications into silicon , 1999, Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00375).

[16]  Hugo De Man,et al.  Cathedral-III : architecture-driven high-level synthesis for high throughput DSP applications , 1991, 28th ACM/IEEE Design Automation Conference.

[17]  Guang R. Gao,et al.  Optimal Modulo Scheduling Through Enumeration , 2004, International Journal of Parallel Programming.

[18]  John Wawrzynek,et al.  The Garp Architecture and C Compiler , 2000, Computer.

[19]  Josep Llosa,et al.  Reduced code size modulo scheduling in the absence of hardware support , 2002, MICRO.

[20]  Daniel Gajski,et al.  Component selection for high-performance pipelines , 1996, IEEE Trans. Very Large Scale Integr. Syst..

[21]  Daniel P. Siewiorek,et al.  Facet: A Procedure for the Automated Synthesis of Digital Systems , 1983, 20th Design Automation Conference Proceedings.

[22]  Alexandre E. Eichenberger,et al.  Efficient formulation for optimal modulo schedulers , 1997, PLDI '97.

[23]  John J. Granacki,et al.  DEFACTO: A Design Environment for Adaptive Computing Technology , 1999, IPPS/SPDP Workshops.

[24]  Scott A. Mahlke,et al.  Bitwidth cognizant architecture synthesis of custom hardwareaccelerators , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[25]  Chong-Min Kyung,et al.  Fast and near optimal scheduling in automatic data path synthesis , 1991, 28th ACM/IEEE Design Automation Conference.

[26]  Josep Llosa,et al.  Swing module scheduling: a lifetime-sensitive approach , 1996, Proceedings of the 1996 Conference on Parallel Architectures and Compilation Technique.

[27]  Scott A. Mahlke,et al.  PICO-NPA: High-Level Synthesis of Nonprogrammable Hardware Accelerators , 2002, J. VLSI Signal Process..

[28]  Alexandre E. Eichenberger,et al.  Minimum register requirements for a modulo schedule , 1994, MICRO 27.

[29]  Fadi J. Kurdahi,et al.  Module assignment and interconnect sharing in register-transfer synthesis of pipelined data paths , 1989, 1989 IEEE International Conference on Computer-Aided Design. Digest of Technical Papers.

[30]  Maya Gokhale,et al.  NAPA C: compiling for a hybrid RISC/FPGA architecture , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).

[31]  Richard A. Huff,et al.  Lifetime-sensitive modulo scheduling , 1993, PLDI '93.