Dataflow to Hardware Synthesis Framework on FPGAs

We present a dataflow based performance estimation and synthesis framework that will help hardware designers quantify the algorithm performance and synthesize their HW designs onto Field Programmable Gate Arrays (FPGAs). Typically, Digital Signal Processing (DSP) systems are designed by making gradual architectural choices in HW refinement steps. These decisions are based on performance quantification by high level DSP algorithm developers and HW implementation engineers. The main obstacle to this refinement is the provision of reasonably correct performance estimations to guide HW designers in Design Space Exploration (DSE) at an early stage. HW designers face challenges when they need to quantify the performance of their designs, especially when resources are limited. We use dataflow models by describing their hardware detail only as necessary. Dataflow based performance estimation achieves the efficient generation of qualitative and quantitative parameters for the assessment of HW candidates. Reconfigurable logic can be used to off-load the primary computational kernel onto a custom computing machine in order to reduce execution time by an order of magnitude as compared to kernel execution on a general purpose processor. Specifically, FPGAs can be used to accelerate these kernels using hardware-based custom logic implementations. In this paper, we demonstrate a framework for algorithm acceleration from the dataflow to synthesized HDL design. Experimental results show a linear speedup by adding reasonably small processing elements in FPGA as opposed to using a software implementation running on a typical general purpose processor.

[1]  Youngsoo Kim,et al.  A Dataflow Framework for DSP Algorithm Refinement , 2012, 2012 IEEE Computer Society Annual Symposium on VLSI.

[2]  Yang Qu,et al.  System Level Architecture Exploration for Reconfigurable Systems On Chip , 2006, 2006 International Conference on Field Programmable Logic and Applications.

[3]  Anupam Chattopadhyay,et al.  Ingredients of Adaptability: A Survey of Reconfigurable Processors , 2013, VLSI Design.

[4]  Eckhard Grass,et al.  Globally Asynchronous, Locally Synchronous Circuits: Overview and Outlook , 2007, IEEE Design & Test of Computers.

[5]  Antonio Ortega,et al.  Line-based, reduced memory, wavelet image compression , 2000, IEEE Trans. Image Process..

[6]  Youngsoo Kim,et al.  H.264 Video Decoder Design: Beyond RTL Design Implementation , 2006, 2006 IEEE Workshop on Signal Processing Systems Design and Implementation.

[7]  Huazhong Yang,et al.  A hierarchical C2RTL framework for FIFO-connected stream applications , 2012, 17th Asia and South Pacific Design Automation Conference.

[8]  Peter Lee,et al.  An Efficient Implementation of a 2D DWT on FPGA , 2007, 2007 International Conference on Field Programmable Logic and Applications.

[9]  S. Kung,et al.  VLSI Array processors , 1985, IEEE ASSP Magazine.

[10]  Douglas S. Reeves,et al.  Parallel image processing with the block data parallel architecture , 1996, Proc. IEEE.

[11]  Keshab K. Parhi,et al.  High-level DSP synthesis using concurrent transformations, scheduling, and allocation , 1995, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst..

[12]  Suhaib A. Fahmy,et al.  Mapping for Maximum Performance on FPGA DSP Blocks , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[13]  Valery Sklyarov,et al.  Interactions of Zynq-7000 devices with general purpose computers through PCI-express: A case study , 2016, 2016 18th Mediterranean Electrotechnical Conference (MELECON).