论文信息 - Exact and Heuristic Allocation of MuIti-kernel Applications to Multi-FPGA Platforms

Exact and Heuristic Allocation of MuIti-kernel Applications to Multi-FPGA Platforms

FPGA-based accelerators demonstrated high energy efficiency compared to GPUs and CPUs. However, single FPGA designs may not achieve sufficient task parallelism. In this work, we optimize the mapping of high-performance multi-kernel applications, like Convolutional Neural Networks, to multi-FPGA platforms. First, we formulate the system level optimization problem, choosing within a huge design space the parallelism and number of compute units for each kernel in the pipeline. Then we solve it using a combination of Geometric Programming, producing the optimum performance solution given resource and DRAM bandwidth constraints, and a heuristic allocator of the compute units on the FPGA cluster.

[1] Abhishek Udupa,et al. Synergistic execution of stream programs on multicores with accelerators , 2009, LCTES '09.

[2] Sander Stuijk,et al. Throughput-Buffering Trade-Off Exploration for Cyclo-Static and Synchronous Dataflow Graphs , 2008, IEEE Transactions on Computers.

[3] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[4] Jason Cong,et al. Optimizing FPGA-based Accelerator Design for Deep Convolutional Neural Networks , 2015, FPGA.

[5] Michael I. Gordon,et al. Exploiting coarse-grained task, data, and pipeline parallelism in stream programs , 2006, ASPLOS XII.

[6] Jason Cong,et al. Combining computation and communication optimizations in system synthesis for streaming applications , 2014, FPGA.

[7] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[8] Luca P. Carloni,et al. COSMOS , 2017, ACM Trans. Embed. Comput. Syst..