A New HLS Allocation Algorithm for Efficient DSP Utilization in FPGAs

In this paper, an algorithm of allocation for FPGA dedicated HLS flow is proposed. This algorithm takes as input a Data Flow Graph (DFG) and provides an optimized implementation of the considered DFG. The proposed approach allows an efficient resource utilization thanks to series of tests and processes done on the DFG’s nodes. We compare our method considering different design goals against several implementation techniques. The results show an enhancement in terms of LUT utilization reaching up to 89%. The power consumption has also been improved by up to 37%. Finally the maximum frequency increased significantly thanks to the efficient use of FPGA DSP blocks.

[1]  R. Bapat Graphs and Matrices , 2014 .

[2]  Daniel Gajski,et al.  Introduction to high-level synthesis , 1994, IEEE Design & Test of Computers.

[3]  Miodrag Potkonjak,et al.  MediaBench: a tool for evaluating and synthesizing multimedia and communications systems , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.

[4]  Fabrizio Ferrandi,et al.  Exploiting vectorization in high level synthesis of nested irregular loops , 2017, J. Syst. Archit..

[5]  Jason Helge Anderson,et al.  LegUp: high-level synthesis for FPGA-based processor/accelerator systems , 2011, FPGA '11.

[6]  Suhaib A. Fahmy,et al.  Efficient mapping of mathematical expressions into DSP blocks , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[7]  Roger F. Woods,et al.  FPGA-Based Soft-Core Processors for Image Processing Applications , 2016, J. Signal Process. Syst..

[8]  Fabrizio Ferrandi,et al.  Bambu: A modular framework for the high level synthesis of memory-intensive applications , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[9]  Robert Rinker,et al.  An automated process for compiling dataflow graphs into reconfigurable hardware , 2001, IEEE Trans. Very Large Scale Integr. Syst..

[10]  Dominique Houzet,et al.  Task migration of DSP application specified with a DFG and implemented with the BSP computing model on a CPU-GPU cluster , 2013, 2013 Conference on Design and Architectures for Signal and Image Processing.

[11]  Suhaib A. Fahmy,et al.  Evaluating the efficiency of DSP Block synthesis inference from flow graphs , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).

[12]  Voicu Groza,et al.  Automatic generation of VHDL hardware code from data flow graphs , 2011, 2011 6th IEEE International Symposium on Applied Computational Intelligence and Informatics (SACI).

[13]  Anirban Sengupta,et al.  TL-HLS: Methodology for Low Cost Hardware Trojan Security Aware Scheduling With Optimal Loop Unrolling Factor During High Level Synthesis , 2017, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[14]  Loïc Lagadec,et al.  A Unified Design Flow to Automatically Generate On-Chip Monitors During High-Level Synthesis of Hardware Accelerators , 2017, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[15]  Suhaib A. Fahmy,et al.  Mapping for Maximum Performance on FPGA DSP Blocks , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[16]  Benjamin Carrion Schafer Enabling High-Level Synthesis Resource Sharing Design Space Exploration in FPGAs Through Automatic Internal Bitwidth Adjustments , 2017 .

[17]  Yu Ting Chen,et al.  A Survey and Evaluation of FPGA High-Level Synthesis Tools , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[18]  Vlad Mihai Sima,et al.  DWARV 2.0: A CoSy-based C-to-VHDL hardware compiler , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).