A compiler approach to fast hardware design space exploration in FPGA-based systems

The current practice of mapping computations to custom hardware implementations requires programmers to assume the role of hardware designers. In tuning the performance of their hardware implementation, designers manually apply loop transformations such as loop unrolling. designers manually apply loop transformations. For example, loop unrolling is used to expose instruction-level parallelism at the expense of more hardware resources for concurrent operator evaluation. Because unrolling also increases the amount of data a computation requires, too much unrolling can lead to a memory bound implementation where resources are idle. To negotiate inherent hardware space-time trade-offs, designers must engage in an iterative refinement cycle, at each step manually applying transformations and evaluating their impact. This process is not only error-prone and tedious but also prohibitively expensive given the large search spaces and with long synthesis times. This paper describes an automated approach to hardware design space exploration, through a collaboration between parallelizing compiler technology and high-level synthesis tools. We present a compiler algorithm that automatically explores the large design spaces resulting from the application of several program transformations commonly used in application-specific hardware designs. Our approach uses synthesis estimation techniques to quantitatively evaluate alternate designs for a loop nest computation. We have implemented this design space exploration algorithm in the context of a compilation and synthesis system called DEFACTO, and present results of this implementation on five multimedia kernels. Our algorithm derives an implementation that closely matches the performance of the fastest design in the design space, and among implementations with comparable performance, selects the smallest design. We search on average only 0.3% of the design space. This technology thus significantly raises the level of abstraction for hardware design and explores a design space much larger than is feasible for a human designer.

[1]  Markus Weinhardt Compilation and Pipeline Synthesis for Reconfigurable Architectures , 1997 .

[2]  John G. Proakis,et al.  Digital signal processing (3rd ed.): principles, algorithms, and applications , 1996 .

[3]  Dominique Lavenier,et al.  Evaluation of the streams-C C-to-FPGA compiler: an applications perspective , 2001, FPGA '01.

[4]  Ken Kennedy,et al.  Improving the ratio of memory operations to floating-point operations in loops , 1994, TOPL.

[5]  Wayne Luk,et al.  Structured hardware compilation of parallel programs , 1994 .

[6]  Pedro C. Diniz,et al.  Bridging the Gap between Compilation and Synthesis in the DEFACTO System , 2001, LCPC.

[7]  John G. Proakis,et al.  Digital Signal Processing: Principles, Algorithms, and Applications , 1992 .

[8]  Yanbing Li,et al.  Hardware-software co-design of embedded reconfigurable architectures , 2000, DAC.

[9]  Fadi J. Kurdahi,et al.  Fast area estimation to support compiler optimizations in FPGA-based reconfigurable systems , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[10]  Ken Kennedy,et al.  Improving register allocation for subscripted variables , 1990, PLDI '90.

[11]  Carl Ebeling,et al.  Specifying and compiling applications for RaPiD , 1998, Proceedings. IEEE Symposium on FPGAs for Custom Computing Machines (Cat. No.98TB100251).

[12]  John P. Elliott Understanding Behavioral Synthesis: A Practical Guide to High-Level Design , 1999 .

[13]  Seth Copen Goldstein,et al.  PipeRench: a co/processor for streaming multimedia acceleration , 1999, ISCA.

[14]  Raul Camposano Behavioral synthesis , 1995, IEEE Design & Test of Computers.

[15]  Rajeev Barua,et al.  Maps: a compiler-managed memory system for raw machines , 1999, ISCA.

[16]  Monk-Ping Leong,et al.  A bit-serial implementation of the international data encryption algorithm IDEA , 2000, Proceedings 2000 IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00871).

[17]  Pedro C. Diniz,et al.  Coarse-grain pipelining on multiple FPGA architectures , 2002, Proceedings. 10th Annual IEEE Symposium on Field-Programmable Custom Computing Machines.

[18]  Csaba Andras Moritz,et al.  Parallelizing applications into silicon , 1999, Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00375).

[19]  John P. Elliott Understanding Behavioral Synthesis , 1999, Springer US.

[20]  Steven Derrien,et al.  Loop Tiling for Reconfigurable Accelerators , 2001, FPL.

[21]  Robert Rinker,et al.  An automated process for compiling dataflow graphs into reconfigurable hardware , 2001, IEEE Trans. Very Large Scale Integr. Syst..

[22]  B. Ramakrishna Rau,et al.  Efficient design space exploration in PICO , 2000, CASES '00.