论文信息 - A Hardware Pipeline with High Energy and Resource Efficiency for FMM Acceleration

A Hardware Pipeline with High Energy and Resource Efficiency for FMM Acceleration

The fast multipole method (FMM) is a promising mathematical technique that accelerates the calculation of long-ranged forces in the large-sized n-body problem. Existing implementations of the FMM on general-purpose processors are energy and resource inefficient. To mitigate these issues, we propose a hardware pipeline that accelerates three key FMM steps. The pipeline improves energy efficiency by exploiting fine-granularity parallelism of the FMM. We reuse the pipeline for different FMM steps to reduce resource usage by 66%. Compared to the state-of-the-art implementations on CPUs and GPUs, our implementation requires 15% less energy and delivers 2.61 times more floating-point operations.

[1] Ramani Duraiswami,et al. Fast multipole methods on graphics processors , 2008, J. Comput. Phys..

[2] Makoto Taiji,et al. 42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[3] Henrik G. Petersen,et al. Error estimates for the fast multipole method. II. The three-dimensional case , 1995, Proceedings of the Royal Society of London. Series A: Mathematical and Physical Sciences.

[4] Walter Dehnen,et al. A fast multipole method for stellar dynamics , 2014, 1405.2255.

[5] Eric Darve,et al. The black-box fast multipole method , 2009, J. Comput. Phys..

[6] Eric Darve,et al. Optimizing the multipole‐to‐local operator in the fast multipole method for graphical processing units , 2012 .

[7] Rolf Krause,et al. A massively parallel, multi-disciplinary Barnes-Hut tree code for extreme-scale N-body simulations , 2012, Comput. Phys. Commun..

[8] Hao Yu,et al. A Parallel and Incremental Extraction of Variational Capacitance With Stochastic Geometric Moments , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[9] Eric Darve,et al. The Fast Multipole Method , 2000 .

[10] Piotr Dudek,et al. Low power high-performance smart camera system based on SCAMP vision sensor , 2013, J. Syst. Archit..

[11] Rio Yokota,et al. Petascale turbulence simulation using a highly parallel fast multipole method on GPUs , 2011, Comput. Phys. Commun..

[12] Leslie Greengard,et al. A fast algorithm for particle simulations , 1987 .

[13] Shinnosuke Obi,et al. Fast multipole methods on a cluster of GPUs for the meshless simulation of turbulence , 2009, Comput. Phys. Commun..

[14] Meikang Qiu,et al. Revealing Feasibility of FMM on ASIC: Efficient Implementation of N-Body Problem on FPGA , 2010, 2010 13th IEEE International Conference on Computational Science and Engineering.

[15] Xiaobo Sharon Hu,et al. Accelerating radiation dose calculation: A multi-FPGA solution , 2013, TECS.

[16] Henrik Gordon Petersen,et al. Error estimates for the fast multipole method. I. The two-dimensional case , 1995, Proceedings of the Royal Society of London. Series A: Mathematical and Physical Sciences.

[17] Ya Hui Chai,et al. Computing Acceleration of FMM Algorithm on the Basis of FPGA and GPU , 2011 .

[18] Carlos Carreras,et al. Memory optimization in FPGA-accelerated scientific codes based on unstructured meshes , 2014, J. Syst. Archit..

[19] Matthew G. Knepley,et al. PetFMM—A dynamically load‐balancing parallel fast multipole library , 2009, ArXiv.

[20] Eric F Darve. Regular ArticleThe Fast Multipole Method: Numerical Implementation , 2000 .

[21] Lorena A. Barba,et al. How Will the Fast Multipole Method Fare in the Exascale Era , 2013 .

[22] Lorena A. Barba,et al. Characterization of the errors of the FMM in particle simulations , 2008, ArXiv.

[23] K. Schmidt,et al. Implementing the fast multipole method in three dimensions , 1991 .

[24] Yusuke Hagihara. Celestial mechanics. Vol.1: Dynamical principles and transformation theory , 1970 .

[25] Richard W. Vuduc,et al. A CPU: GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method , 2014, GPGPU@ASPLOS.

[26] Henrik G. Petersen,et al. Error estimates for the fast multipole method , 1997 .

[27] Richard W. Vuduc,et al. A massively parallel adaptive fast-multipole method on heterogeneous architectures , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[28] Eric F Darve. The Fast Multipole Method , 2000 .

[29] Sotirios G. Ziavras,et al. Multicore-based vector coprocessor sharing for performance and energy gains , 2013, TECS.