Algorithmic and memory optimizations on multiple application mapping onto FPGAs

Field Programmable Gate Arrays (FPGAs) offer a low power flexible accelerator alternative due to their inherent parallelism. Reprogrammability, although its their key feature, it is used almost exclusively on design time due to the constrains imposed by the modern CAD tools that require even days to run and tens of GB of RAM. In order to effectively utilize FPGAs on run time we propose a novel methodology and the supporting toolflow that enable efficient mapping of multiple applications onto heterogeneous FPGAs. With the use of a floorplanning step, memory optimizations and custom memory allocators, we alleviate the constrains imposed by CAD tools, and provide a proof of concept that application mapping onto FPGAs can be done on run time. Experimental results prove the efficiency of the introduced solution, as we achieve application's mapping 40× faster on average compared to a state-of-art approach, without performance degradation and with 12× on average reduced memory usage.

[1]  Jin-Hee Cho,et al.  Trust-Based Multi-objective Optimization for Node-to-Task Assignment in Coalition Networks , 2013, FCCM 2013.

[2]  Marcel Gort,et al.  Accelerating FPGA Routing Through Parallelization and Engineering Enhancements Special Section on PAR-CAD 2010 , 2012, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.

[3]  Kathryn S. McKinley,et al.  Hoard: a scalable memory allocator for multithreaded applications , 2000, SIGP.

[4]  Harry Sidiropoulos,et al.  A Framework for Mapping Dynamic Virtual Kernels onto Heterogeneous Reconfigurable Platforms , 2014, 2014 IEEE International Parallel & Distributed Processing Symposium Workshops.

[5]  Vaughn Betz,et al.  Speeding Up FPGA Placement: Parallel Algorithms and Methods , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.

[6]  Jürgen Becker,et al.  JITPR: A framework for supporting fast application's implementation onto FPGAs , 2013, TRETS.

[7]  Marcel Gort,et al.  Analytical placement for heterogeneous FPGAs , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).

[8]  Vaughn Betz,et al.  Efficient and Deterministic Parallel Placement for FPGAs , 2011, TODE.

[9]  Gary William Grewal,et al.  Forward-scaling, serially equivalent parallelism for FPGA placement , 2014, GLSVLSI '14.

[10]  Vaughn Betz,et al.  Timing-Driven Titan: Enabling Large Benchmarks and Exploring the Gap between Academic and Commercial CAD , 2015, TRETS.

[11]  Jason Evans April A Scalable Concurrent malloc(3) Implementation for FreeBSD , 2006 .

[12]  Husain Parvez,et al.  Exploring alternate trade-offs of placement quality versus runtime in Simulated Annealing algorithm , 2014, 2014 9th International Symposium on Reconfigurable and Communication-Centric Systems-on-Chip (ReCoSoC).

[13]  Kenneth B. Kent,et al.  The VTR project: architecture and CAD for FPGAs from verilog to routing , 2012, FPGA '12.

[14]  Tom Feist,et al.  Vivado Design Suite , 2012 .

[15]  Jianwen Zhu,et al.  Towards scalable placement for FPGAs , 2010, FPGA '10.

[16]  Philip Brisk,et al.  Parallel FPGA routing based on the operator formulation , 2014, 2014 51st ACM/EDAC/IEEE Design Automation Conference (DAC).