Off-Loading LET Generation to PEACH2: A Switching Hub for High Performance GPU Clusters

A hardware local essential tree (LET) generator used in an N-body simulation is implemented on the FPGA of PEACH2 (PCI Express Adaptive Communication Hub ver2), a low latency switching hub for high performance GPU clusters. By using the pipelined on-the-fly execution with a multipole acceptance criterion judging module and a data updating module, the generation performance is 2.2 times faster than that with the CPU. When data communication is considered, the performance was 7.2 times as the case with the CPU.

[1]  Xin Yang,et al.  Design and Implementation of a Field Programmable CRC Circuit Architecture , 2009, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[2]  Michael S. Warren,et al.  Astrophysical N-body simulations using hierarchical tree data structures , 1992, Proceedings Supercomputing '92.

[3]  Tetsuya Asai,et al.  Caching memcached at reconfigurable network interface , 2014, 2014 24th International Conference on Field Programmable Logic and Applications (FPL).

[4]  Mitsuhisa Sato,et al.  Interconnection Network for Tightly Coupled Accelerators Architecture , 2013, 2013 IEEE 21st Annual Symposium on High-Performance Interconnects.

[5]  Hideharu Amano,et al.  Performance analysis of fully-adaptable CRC accelerators on an FPGA , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).

[6]  Michael S. Warren,et al.  Skeletons from the treecode closet , 1994 .

[7]  Tomoyoshi Ito,et al.  GRAPE: a special-purpose computer for N-body problems , 1990, [1990] Proceedings of the International Conference on Application Specific Array Processors.

[8]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[9]  Jeroen Bédorf,et al.  A sparse octree gravitational N-body code that runs entirely on the GPU processor , 2011, J. Comput. Phys..

[10]  Toshikazu Ebisuzaki,et al.  GRAPE: special purpose computer for simulations of many-body systems , 1993, [1993] Proceedings of the Twenty-sixth Hawaii International Conference on System Sciences.

[11]  Ruby B. Lee,et al.  General-purpose FPGA platform for efficient encryption and hashing , 2010, ASAP 2010 - 21st IEEE International Conference on Application-specific Systems, Architectures and Processors.

[12]  Jeroen Bédorf,et al.  24.77 Pflops on a Gravitational Tree-Code to Simulate the Milky Way Galaxy with 18600 GPUs , 2014, SC14: International Conference for High Performance Computing, Networking, Storage and Analysis.

[13]  Mitsuhisa Sato,et al.  PEACH2: An FPGA-based PCIe network device for Tightly Coupled Accelerators , 2014, CARN.

[14]  H. Sagan Space-filling curves , 1994 .

[15]  M. S. Warren,et al.  A parallel hashed Oct-Tree N-body algorithm , 1993, Supercomputing '93.