JetStream: An open-source high-performance PCI Express 3 streaming library for FPGA-to-Host and FPGA-to-FPGA communication

Many FPGA-based accelerators are constrained by the available resources and multi-FPGA solutions can be necessary for building more capable systems. Available PCIe solutions provide only FPGA-to-Host communication. In this paper we present JetStream, an open-source1 modular PCIe 3 library, supporting not only fast FPGA-to-Host communication, but also allowing direct FPGA-to-FPGA communication which fully bypasses the memory subsystem. The direct mode saves memory bandwidth for multicast modes and permits to connect multiple FPGAs in various software defined topologies. We show the benefits of JetStream with a large FIR filter spanning four FPGA boards, achieving throughputs of up to 7.09 GB/s per link. Utilizing direct FPGA-to-FPGA transfers reduces the required memory bandwidth by up to 75%.

[1]  Karin Strauss,et al.  Accelerating Deep Convolutional Neural Networks Using Specialized Hardware , 2015 .

[2]  Rolf Ernst,et al.  FlexWAFE - A High-end Real-Time Stream Processing Library for FPGAs , 2007, 2007 44th ACM/IEEE Design Automation Conference.

[3]  C. Bohm,et al.  High performance FPGA-based DMA interface for PCIe , 2012, 2012 18th IEEE-NPSS Real Time Conference.

[4]  Andreas Koch,et al.  ffLink: A Lightweight High-Performance Open-Source PCI Express Gen3 Interface for Reconfigurable Accelerators , 2016, CARN.

[5]  Kizheppatt Vipin,et al.  DyRACT: A partial reconfiguration enabled accelerator and test platform , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[6]  Kizheppatt Vipin,et al.  System-level FPGA device driver with high-level synthesis support , 2013, 2013 International Conference on Field-Programmable Technology (FPT).

[7]  John Ayer,et al.  Understanding Performance of PCI Express Systems , 2008 .

[8]  Ryan Kastner,et al.  RIFFA 2.0: A reusable integration framework for FPGA accelerators , 2013, 2013 23rd International Conference on Field programmable Logic and Applications.

[9]  Jason Cong,et al.  An efficient and flexible host-FPGA PCIe communication library , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[10]  Jeffrey Stuecheli,et al.  CAPI: A Coherent Accelerator Processor Interface , 2015, IBM J. Res. Dev..

[11]  Ray Bittner Speedy bus mastering PCI express , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).