Using hls4ml to Map Convolutional Neural Networks on Interconnected FPGA Devices

Workflow used in this paper to partition CNNs in multiple FPGAs.

[1]  Jason Cong,et al.  Energy-Efficient CNN Implementation on a Deeply Pipelined FPGA Cluster , 2016, ISLPED.

[2]  Luiz André Barroso,et al.  The tail at scale , 2013, CACM.

[3]  R. Siezen,et al.  others , 1999, Microbial Biotechnology.

[4]  Yu Wang,et al.  [DL] A Survey of FPGA-based Neural Network Inference Accelerators , 2019, ACM Trans. Reconfigurable Technol. Syst..

[5]  Forrest N. Iandola,et al.  SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and <1MB model size , 2016, ArXiv.

[6]  Michael Ferdman,et al.  Maximizing CNN accelerator efficiency through resource partitioning , 2016, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).

[7]  Song Han,et al.  Fast inference of deep neural networks in FPGAs for particle physics , 2018, Journal of Instrumentation.