Throughput Optimization for High-Level Synthesis Using Resource Constraints

Programming productivity of FPGA devices remains a significant challenge, despite the emergence of robust high level synthesis tools to automatically transform codes written in high-level languages into RTL implementations. Focusing on a class of programs with regular loop bounds and array accesses, the polyhedral compilation framework provides a convenient environment to automate many of the manual program transformation tasks that are still needed to improve the QoR of the HLS tool. In this work, we demonstrate that traditional affine loop transformations are not always enough to achieve the best throughput, determined by the Initiation Interval (II) for loop pipelining, and other transformations such as Index-Set Splitting (ISS) can lead to better performance. We develop a customized affine+ISS optimization algorithm that aims at reducing the II of pipelined inner loops to reduce the program latency. We report experimental results on numerous affine computations, showing significant latency and energy improvements.

[1]  Pedro C. Diniz,et al.  A compiler approach to fast hardware design space exploration in FPGA-based systems , 2002, PLDI '02.

[2]  David Parello,et al.  Semi-Automatic Composition of Loop Transformations for Deep Parallelism and Memory Hierarchies , 2006, International Journal of Parallel Programming.

[3]  Jason Cong,et al.  Improving high level synthesis optimization opportunity through polyhedral transformations , 2013, FPGA '13.

[4]  Pedro C. Diniz,et al.  Bridging the Gap between Compilation and Synthesis in the DEFACTO System , 2001, LCPC.

[5]  Jason Cong,et al.  Improving polyhedral code generation for high-level synthesis , 2013, 2013 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).

[6]  High-Level Synthesis Tools for Xilinx FPGAs , 2010 .

[7]  Jason Cong,et al.  Memory partitioning for multidimensional arrays in high-level synthesis , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[8]  Alain Darte,et al.  Optimizing remote accesses for offloaded kernels: application to high-level synthesis for FPGA , 2013, DATE 2013.

[9]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. Part II. Multidimensional time , 1992, International Journal of Parallel Programming.

[10]  Yun Liang,et al.  High-Level Synthesis: Productivity, Performance, and Software Constraints , 2012, J. Electr. Comput. Eng..

[11]  François Irigoin,et al.  Supernode partitioning , 1988, POPL '88.

[12]  Sven Verdoolaege,et al.  isl: An Integer Set Library for the Polyhedral Model , 2010, ICMS.

[13]  Jason Cong,et al.  Automatic memory partitioning and scheduling for throughput and power optimization , 1999, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.

[14]  Uday Bondhugula,et al.  A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[15]  Ken Kennedy,et al.  Automatic translation of FORTRAN programs to vector form , 1987, TOPL.

[16]  Jason Cong,et al.  High-Level Synthesis for FPGAs: From Prototyping to Deployment , 2011, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[17]  Jason Cong,et al.  Memory partitioning and scheduling co-optimization in behavioral synthesis , 2012, 2012 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[18]  Michael Wolfe,et al.  More iteration space tiling , 1989, Proceedings of the 1989 ACM/IEEE Conference on Supercomputing (Supercomputing '89).

[19]  Philippe Coussy,et al.  High-Level Synthesis: from Algorithm to Digital Circuit , 2008 .

[20]  Cédric Bastoul,et al.  Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[21]  Jason Cong,et al.  Polyhedral-based data reuse optimization for configurable computing , 2013, FPGA '13.

[22]  Alexander Aiken,et al.  Perfect Pipelining: A New Loop Parallelization Technique , 1988, ESOP.

[23]  Qiang Liu,et al.  Combining Data Reuse With Data-Level Parallelization for FPGA-Targeted Hardware Compilation: A Geometric Programming Framework , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[24]  Gilles Villard,et al.  Lattice-based memory allocation , 2003, IEEE Transactions on Computers.

[25]  Alain Darte,et al.  Optimizing DDR-SDRAM communications at C-level for automatically-generated hardware accelerators an experience with the Altera C2H HLS tool , 2010, ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors.

[26]  George A. Constantinides,et al.  Optimizing SDRAM bandwidth for custom FPGA loop accelerators , 2012, FPGA '12.

[27]  Albert Cohen,et al.  Polyhedral-Model Guided Loop-Nest Auto-Vectorization , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.

[28]  Uday Bondhugula,et al.  Automatic Transformations for Communication-Minimized Parallelization and Locality Optimization in the Polyhedral Model , 2008, CC.

[29]  Martin Griebl,et al.  Index Set Splitting , 2000, International Journal of Parallel Programming.

[30]  Allen,et al.  Optimizing Compilers for Modern Architectures , 2004 .

[31]  P. Feautrier Parametric integer programming , 1988 .

[32]  Alain Darte,et al.  Optimizing remote accesses for offloaded kernels: Application to high-level synthesis for FPGA , 2012, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).