High Performance Communication on Reconfigurable Clusters

FPGA clusters with the FPGAs directly linked through their Multi-Gigabit Transceivers (MGT) have a proven advantage over other commodity architectures for communication-bound applications. To date, however, communication infrastructure for such clusters has generally taken one of two approaches: nearest neighbor only, which is fast but has limited utility, and processor-based, which is general, but relatively slow. What is needed is for communication microarchitecture of these systems to be systematically explored, as has been done for HPC clusters and for Networks on Chip (NoC) on both FPGAs and ASICs. Our first contribution is finding that the properties of clusters of tightly coupled FPGAs substantially influence the router design space. We create a candidate router and generalize it so that it is parameterized by routing algorithm, arbitration policy, and virtual channels (VC). We have created a cycle-accurate simulator validated on a four-FPGA system. We evaluate the design space with respect to a number of standard communication patterns and packet sizes. These results enable selection of the appropriate router for any resource budget. We find that the optimality of the router design varies significantly with workloads. We present a framework that helps to determine appropriate parameters based on different applications and generate the HDL design. We observe that for a 512 FPGA cluster, compared with the router configuration with the best average performance, application-aware router selection can lead to substantial improvement in performance or reduction in area.

[1]  Martin C. Herbordt,et al.  Performance potential of molecular dynamics simulations on high performance reconfigurable computing systems , 2008, 2008 Second International Workshop on High-Performance Reconfigurable Computing Technology and Applications.

[2]  A. Skjellum,et al.  Accelerating MPI _ Reduce with FPGAs in the Network Extended Abstract , 2017 .

[3]  Nachiket Kapre,et al.  Hoplite: Building austere overlay NoCs for FPGAs , 2015, 2015 25th International Conference on Field Programmable Logic and Applications (FPL).

[4]  James C. Hoe,et al.  CONNECT: re-examining conventional wisdom for designing nocs in the context of FPGAs , 2012, FPGA '12.

[5]  Jiayi Sheng,et al.  Towards Low-Latency Communication on FPGA Clusters with 3 D FFT Case Study , 2015 .

[6]  Martin C. Herbordt,et al.  Performance potential of molecular dynamics simulations on high performance reconfigurable computing systems , 2008 .

[7]  Chen Yang,et al.  Novo-G#: Large-scale reconfigurable computing with direct and programmable interconnects , 2016, 2016 IEEE High Performance Extreme Computing Conference (HPEC).

[8]  Chen Yang,et al.  HPC on FPGA clouds: 3D FFTs and implications for molecular dynamics , 2017, 2017 27th International Conference on Field Programmable Logic and Applications (FPL).

[9]  Ron Sass,et al.  Reconfigurable Computing Cluster (RCC) Project: Investigating the Feasibility of FPGA-Based Petascale Computing , 2007 .

[10]  S. Lennart Johnsson,et al.  ROMM routing on mesh and torus networks , 1995, SPAA '95.

[11]  William J. Dally,et al.  Locality-preserving randomized oblivious routing on torus networks , 2002, SPAA '02.

[12]  Martin C. Herbordt,et al.  Design trade-offs of low-cost multicomputer network switches , 1999, Proceedings. Frontiers '99. Seventh Symposium on the Frontiers of Massively Parallel Computation.

[13]  Chen Yang,et al.  Collective Communication on FPGA Clusters with Static Scheduling , 2017, CARN.

[14]  Chita R. Das,et al.  A low latency router supporting adaptivity for on-chip interconnects , 2005, Proceedings. 42nd Design Automation Conference, 2005..

[15]  Benjamin Humphries,et al.  Design of 3D FFTs with FPGA clusters , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).

[16]  F. Leighton,et al.  Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .

[17]  Philip Heidelberger,et al.  The IBM Blue Gene/Q interconnection network and message unit , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[18]  Akif Ali,et al.  Near-optimal worst-case throughput routing for two-dimensional mesh networks , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).

[19]  Martin C. Herbordt,et al.  FPGA HPC using OpenCL: Case Study in 3D FFT , 2018, HEART.

[20]  Tjerk P. Straatsma,et al.  NWChem: A comprehensive and scalable open-source solution for large scale molecular simulations , 2010, Comput. Phys. Commun..