Design of 3D FFTs with FPGA clusters

The three dimensional Fast Fourier Transform (3D FFT) is widely applied in various scientific applications. Distributed 3D FFTs require global communication: this becomes a serious concern when strong scaling is required as in long timescale molecular dynamics simulations. In this paper, we propose a parameterized 3D FFT design that targets at a 3D-torus FPGA-based network of various sizes. Characteristics include direct FPGA-FPGA communication links, support for various internal switch designs, and use of table-based routing which saves chip area and routing cycles. We find that even assuming extremely conservative parameters, we are able to run the 163 FFT in 3.9μs, 323 FFT in 5.46μs, 643 FFT in 9.52μs, and 1283 FFT in 25.72μs. These results indicate that clusters based on commodity FPGAs are likely to be appropriate when strong scaling is needed in applications limited by the 3D FFT.

[1]  E. Katchalski‐Katzir,et al.  Molecular surface recognition: determination of geometric fit between proteins and their ligands by correlation techniques. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[2]  John Kim,et al.  Router microarchitecture and scalability of ring topology in on-chip networks , 2009, 2009 2nd International Workshop on Network on Chip Architectures.

[3]  John L. Klepeis,et al.  Anton, a special-purpose machine for molecular dynamics simulation , 2007, ISCA '07.

[4]  Martin C. Herbordt,et al.  Efficient Calculation of Pairwise Nonbonded Forces , 2011, 2011 IEEE 19th Annual International Symposium on Field-Programmable Custom Computing Machines.

[5]  Benjamin Humphries,et al.  Using offline routing to implement a low latenc 3D FFT in a multinode FPGA system , 2013 .

[6]  Martin C. Herbordt,et al.  FPGA acceleration of rigid-molecule docking codes , 2010, IET Comput. Digit. Tech..

[7]  Martin C. Herbordt,et al.  Performance potential of molecular dynamics simulations on high performance reconfigurable computing systems , 2008 .

[8]  David E. Culler,et al.  Hot Interconnects , 1995 .

[9]  Alan D. George,et al.  Novo-G: At the Forefront of Scalable Reconfigurable Supercomputing , 2011, Computing in Science & Engineering.

[10]  C HerbordtMartin,et al.  Molecular Dynamics Simulations on High-Performance Reconfigurable Computing Systems , 2010 .

[11]  Luca Benini,et al.  A Method for Routing Packets Across Multiple Paths in NoCs with In-Order Delivery and Fault-Tolerance Gaurantees , 2007, VLSI Design.

[12]  Martin C. Herbordt,et al.  Performance potential of molecular dynamics simulations on high performance reconfigurable computing systems , 2008, 2008 Second International Workshop on High-Performance Reconfigurable Computing Technology and Applications.

[13]  M. Balakrishnan,et al.  Accelerating 3D-FFT Using Hard Embedded Blocks in FPGAs , 2013, 2013 26th International Conference on VLSI Design and 2013 12th International Conference on Embedded Systems.

[14]  G. Edward Suh,et al.  Optimal and Heuristic Application-Aware Oblivious Routing , 2013, IEEE Transactions on Computers.

[15]  Benjamin Humphries,et al.  3D FFTs on a Single FPGA , 2014, FCCM 2014.

[16]  Pradeep Dubey,et al.  High-Performance 3D Compressive Sensing MRI Reconstruction Using Many-Core Architectures , 2011, Int. J. Biomed. Imaging.

[17]  F. Leighton,et al.  Introduction to Parallel Algorithms and Architectures: Arrays, Trees, Hypercubes , 1991 .

[18]  Anshul Kumar,et al.  High performance 3D-FFT implementation , 2013, 2013 IEEE International Symposium on Circuits and Systems (ISCAS2013).

[19]  Luca Benini,et al.  A multi-path routing strategy with guaranteed in-order packet delivery and fault-tolerance for networks on chip , 2006, 2006 43rd ACM/IEEE Design Automation Conference.

[20]  M. Suzuoki,et al.  Overview of the architecture, circuit design, and physical implementation of a first-generation cell processor , 2006, IEEE Journal of Solid-State Circuits.

[21]  Benjamin Humphries,et al.  3D FFTs on a Single FPGA , 2014, 2014 IEEE 22nd Annual International Symposium on Field-Programmable Custom Computing Machines.

[22]  J. P. Grossman,et al.  A 32x32x32, spatially distributed 3D FFT in four microseconds on Anton , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[23]  Martin C. Herbordt,et al.  Rigid Molecule Docking: FPGA Reconfiguration for Alternative Force Laws , 2006, EURASIP J. Adv. Signal Process..