A Technique for FPGA Synthesis Driven by Automatic Source Code Analysis and Transformations

This paper presents a technique for automatic synthesis of high-performance FPGA-based computing machines from C language source code. It exploits data-parallelism present in source code, and its approach is based on hardware application of techniques for automatic loop transformations, mainly designed in the area of optimizing compilers for parallel and vector computers. Performance aspects are considered in early stage of design, before low-level synthesis process, through a transformation-intensive branch-and-bound approach, that searches design space exploring area-performance tradeoffs. Furthermore optimizations are applied at architectural level, thus achieving higher benefits with respect to gate-level optimizations, also by means of a library of hardware blocks implementing arithmetic and functional primitives. Application of the technique to partial and complete unrolling of a Successive Over-Relaxation code is presented, with results in terms of effectiveness of area-delay estimation, and speed-up for the generated circuit, ranging from 5 and 30 on a Virtex-E 2000-6 with respect to a Intel Pentium 3 1GHz.

[1]  Albert Cohen,et al.  Putting Polyhedral Loop Transformations to Work , 2003, LCPC.

[2]  David F. Bacon,et al.  Compiler transformations for high-performance computing , 1994, CSUR.

[3]  Alexandru Turjan,et al.  Translating affine nested-loop programs to process networks , 2004, CASES '04.

[4]  Sumit Gupta,et al.  SPARK: A Parallelizing Approach to the High-Level Synthesis of Digital Circuits , 2004 .

[5]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[6]  Martin Griebl,et al.  The loop parallelizer LooPo , 1996 .

[7]  Alexandru Turjan,et al.  System design using Khan process networks: the Compaan/Laura approach , 2004, Proceedings Design, Automation and Test in Europe Conference and Exhibition.

[8]  Brad L. Hutchings,et al.  Automated target recognition on SPLASH 2 , 1997, Proceedings. The 5th Annual IEEE Symposium on Field-Programmable Custom Computing Machines Cat. No.97TB100186).

[9]  D. Qainlant,et al.  ROSE: Compiler Support for Object-Oriented Frameworks , 1999 .

[10]  Robert Rinker,et al.  An automated process for compiling dataflow graphs into reconfigurable hardware , 2001, IEEE Trans. Very Large Scale Integr. Syst..

[11]  Wayne Luk,et al.  Pipeline vectorization , 2001, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[12]  Barbara M. Chapman,et al.  Supercompilers for parallel and vector computers , 1990, ACM Press frontier series.

[13]  Christof Paar,et al.  An FPGA implementation and performance evaluation of the Serpent block cipher , 2000, FPGA '00.

[14]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[15]  Michael Wolfe,et al.  A loop restructuring research tool , 1990 .

[16]  D J Evans,et al.  Parallel processing , 1986 .

[17]  Scott A. Mahlke,et al.  High-level synthesis of nonprogrammable hardware accelerators , 2000, Proceedings IEEE International Conference on Application-Specific Systems, Architectures, and Processors.

[18]  Nadia Tawbi Estimation of nested loops execution time by integer arithmetic in convex polyhedra , 1994, Proceedings of 8th International Parallel Processing Symposium.

[19]  Alain Darte,et al.  the NESTOR Library: A Tool for Implementing FORTRAN Source Transformations , 1999, HPCN Europe.

[20]  Pierre Jouvelot,et al.  Semantical interprocedural parallelization: an overview of the PIPS project , 1991 .

[21]  Beniamino Di Martino,et al.  Parallelization of Non-Simultaneous Iterative Methods for Systems of Linear Equations , 1994, CONPAR.

[22]  Keith D. Cooper,et al.  Engineering a Compiler , 2003 .

[23]  Christian Lengauer,et al.  Loop Parallelization in the Polytope Model , 1993, CONCUR.

[24]  Vincent Loechner,et al.  Precise Data Locality Optimization of Nested Loops , 2004, The Journal of Supercomputing.

[25]  Paul Feautrier,et al.  Automatic Parallelization in the Polytope Model , 1996, The Data Parallel Programming Model.

[26]  François Irigoin,et al.  Supernode partitioning , 1988, POPL '88.

[27]  Milind Girkar,et al.  Automatic Extraction of Functional Parallelism from Ordinary Programs , 1992, IEEE Trans. Parallel Distributed Syst..

[28]  Francky Catthoor,et al.  Custom Memory Management Methodology , 1998, Springer US.

[29]  Steven W. K. Tjiang,et al.  SUIF: an infrastructure for research on parallelizing and optimizing compilers , 1994, SIGP.