High Performance Dynamic Communication on Reconfigurable Clusters

FPGA clusters with the FPGAs directly linked through their Multi-Gigabit Transceiver (MGT) ports have a proven advantage over other commodity architectures for communication-bound applications. We find that the standard wormhole routers need some modification to be appropriate for clusters with tightly coupled FPGAs, and create such a router. We generalize this router so that it is parameterized by several parameters including routing algorithm, arbitration policy, virtual channels, and buffers. We have evaluated these designs with respect to a number standard communication patterns and packet sizes. These results enable selection of the appropriate router for any resource budget. Finally, We find that the optimality of the router design varies significantly with workload. We observe that for a 512 FPGA cluster, connected in an 8^3 torus, compared with the router configuration with the best average performance, application-aware router configurations reduce average batch latency by 3%, improve the throughput by 6% on average, and improve area consumption by 50%.

[1]  Keith D. Underwood,et al.  Intel® Omni-path Architecture: Enabling Scalable, High Performance Fabrics , 2015, 2015 IEEE 23rd Annual Symposium on High-Performance Interconnects.

[2]  Chen Yang,et al.  Novo-G#: Large-scale reconfigurable computing with direct and programmable interconnects , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[3]  Chen Yang,et al.  HPC on FPGA clouds: 3D FFTs and implications for molecular dynamics , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[4]  Chen Yang,et al.  Collective Communication on FPGA Clusters with Static Scheduling , 2017, CARN.

[5]  Jiayi Sheng,et al.  Towards Low-Latency Communication on FPGA Clusters with 3 D FFT Case Study , 2015 .