A PCIe DMA Architecture for Multi-Gigabyte Per Second Data Transmission

We developed a direct memory access (DMA) engine compatible with the Xilinx PCI Express (PCIe) core to provide a high-performance and low-occupancy alternative to commercial solutions. In order to maximize the PCIe throughput while minimizing the FPGA resources utilization, the DMA engine adopts a novel strategy where the DMA address list is stored inside the FPGA and not in the central memory of the host CPU. The FPGA design package is complemented with simple register access to control the DMA engine by a Linux driver. The design is compatible with Xilinx FPGA Families 6 and 7, and operates with the Xilinx PCIe endpoint Generation 1 and 2 with all lane configurations (x1, x2, x4, x8). A multi-engine architecture is also presented, where two x8 lanes cores are used in parallel together with a PCIe bridge, to exploit fully the capabilities of a PCIe Gen2 x16 lanes link. A data throughput of 3461 MBytes/s has been achieved with a single PCIe Gen2 x8 lanes endpoint. If the dual-engine architecture is used, the throughput is increased up to 6920 MBytes/s. The presented DMA is currently used in several experiments at the ANKA synchrotron light source.

[1]  Matthias Balzer,et al.  Ultrafast Streaming Camera Platform for Scientific Applications , 2013, IEEE Transactions on Nuclear Science.

[2]  Jason Cong,et al.  An efficient and flexible host-FPGA PCIe communication library , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[3]  Matthias Balzer,et al.  An ultra-fast data acquisition system for coherent synchrotron radiation with terahertz detectors , 2014 .

[4]  Ray Bittner Speedy bus mastering PCI express , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).