Interplay of loop unrolling and multidimensional memory partitioning in HLS

This paper deals with memory partitioning in the context of high-level synthesis for FPGA technologies. In particular, the work focuses on the area overhead caused by partitioning and sheds light on the interplay with a technique commonly used in HLS, i.e., loop unrolling. As a practical outcome, the study proposes a solution to reduce the area overhead by appropriately controlling the degree of loop unrolling. The experimental results confirm the significance of the analysis as well as the effectiveness of the proposed optimization technique.

[1]  Michael Fingeroff,et al.  High-Level Synthesis Blue Book , 2010 .

[2]  Alessandro Cilardo,et al.  Efficient and scalable OpenMP-based system-level design , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[3]  Albert Cohen,et al.  Iterative optimization in the polyhedral model: part ii, multidimensional time , 2008, PLDI '08.

[4]  Alessandro Cilardo,et al.  Improving Multibank Memory Access Parallelism with Lattice-Based Partitioning , 2015, ACM Trans. Archit. Code Optim..

[5]  Paul Feautrier,et al.  Some efficient solutions to the affine scheduling problem. I. One-dimensional time , 1992, International Journal of Parallel Programming.

[6]  Cédric Bastoul,et al.  Code generation in the polyhedral model is easier than you think , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..

[7]  Monica S. Lam,et al.  A Loop Transformation Theory and an Algorithm to Maximize Parallelism , 1991, IEEE Trans. Parallel Distributed Syst..

[8]  Edoardo Fusella,et al.  Joint communication scheduling and interconnect synthesis for FPGA-based many-core systems , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[9]  Jason Cong,et al.  Theory and algorithm for generalized memory partitioning in high-level synthesis , 2014, FPGA.

[10]  Qiang Liu,et al.  Combining Data Reuse With Data-Level Parallelization for FPGA-Targeted Hardware Compilation: A Geometric Programming Framework , 2008, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[11]  Gilles Villard,et al.  Lattice-based memory allocation , 2003, IEEE Transactions on Computers.

[12]  Jason Cong,et al.  An integrated and automated memory optimization flow for FPGA behavioral synthesis , 2012, 17th Asia and South Pacific Design Automation Conference.

[13]  Jason Cong,et al.  Memory partitioning and scheduling co-optimization in behavioral synthesis , 2012, 2012 IEEE/ACM International Conference on Computer-Aided Design (ICCAD).

[14]  Edoardo Fusella,et al.  Automated design space exploration for FPGA-based heterogeneous interconnects , 2014, Des. Autom. Embed. Syst..

[15]  Alessandro Cilardo,et al.  Area implications of memory partitioning for high-level synthesis on FPGAs , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[16]  Martin Griebl,et al.  The Loop Parallelizer LooPo-Announcement , 1996, LCPC.

[17]  Jason Cong,et al.  Memory partitioning for multidimensional arrays in high-level synthesis , 2013, 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC).

[18]  Alessandro Cilardo,et al.  Design space exploration for high-level synthesis of multi-threaded applications , 2013, J. Syst. Archit..

[19]  Jason Cong,et al.  Automatic memory partitioning and scheduling for throughput and power optimization , 1999, 2009 IEEE/ACM International Conference on Computer-Aided Design - Digest of Technical Papers.