A Hardware Pipeline with High Energy and Resource Efficiency for FMM Acceleration

The fast multipole method (FMM) is a promising mathematical technique that accelerates the calculation of long-ranged forces in the large-sized n-body problem. Existing implementations of the FMM on general-purpose processors are energy and resource inefficient. To mitigate these issues, we propose a hardware pipeline that accelerates three key FMM steps. The pipeline improves energy efficiency by exploiting fine-granularity parallelism of the FMM. We reuse the pipeline for different FMM steps to reduce resource usage by 66%. Compared to the state-of-the-art implementations on CPUs and GPUs, our implementation requires 15% less energy and delivers 2.61 times more floating-point operations.

[1]  Ramani Duraiswami,et al.  Fast multipole methods on graphics processors , 2008, J. Comput. Phys..

[2]  Makoto Taiji,et al.  42 TFlops hierarchical N-body simulations on GPUs with applications in both astrophysics and turbulence , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[3]  Henrik G. Petersen,et al.  Error estimates for the fast multipole method. II. The three-dimensional case , 1995, Proceedings of the Royal Society of London. Series A: Mathematical and Physical Sciences.

[4]  Walter Dehnen,et al.  A fast multipole method for stellar dynamics , 2014, 1405.2255.

[5]  Eric Darve,et al.  The black-box fast multipole method , 2009, J. Comput. Phys..

[6]  Eric Darve,et al.  Optimizing the multipole‐to‐local operator in the fast multipole method for graphical processing units , 2012 .

[7]  Rolf Krause,et al.  A massively parallel, multi-disciplinary Barnes-Hut tree code for extreme-scale N-body simulations , 2012, Comput. Phys. Commun..

[8]  Hao Yu,et al.  A Parallel and Incremental Extraction of Variational Capacitance With Stochastic Geometric Moments , 2012, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.

[9]  Eric Darve,et al.  The Fast Multipole Method , 2000 .

[10]  Piotr Dudek,et al.  Low power high-performance smart camera system based on SCAMP vision sensor , 2013, J. Syst. Archit..

[11]  Rio Yokota,et al.  Petascale turbulence simulation using a highly parallel fast multipole method on GPUs , 2011, Comput. Phys. Commun..

[12]  Leslie Greengard,et al.  A fast algorithm for particle simulations , 1987 .

[13]  Shinnosuke Obi,et al.  Fast multipole methods on a cluster of GPUs for the meshless simulation of turbulence , 2009, Comput. Phys. Commun..

[14]  Meikang Qiu,et al.  Revealing Feasibility of FMM on ASIC: Efficient Implementation of N-Body Problem on FPGA , 2010, 2010 13th IEEE International Conference on Computational Science and Engineering.

[15]  Xiaobo Sharon Hu,et al.  Accelerating radiation dose calculation: A multi-FPGA solution , 2013, TECS.

[16]  Henrik Gordon Petersen,et al.  Error estimates for the fast multipole method. I. The two-dimensional case , 1995, Proceedings of the Royal Society of London. Series A: Mathematical and Physical Sciences.

[17]  Ya Hui Chai,et al.  Computing Acceleration of FMM Algorithm on the Basis of FPGA and GPU , 2011 .

[18]  Carlos Carreras,et al.  Memory optimization in FPGA-accelerated scientific codes based on unstructured meshes , 2014, J. Syst. Archit..

[19]  Matthew G. Knepley,et al.  PetFMM—A dynamically load‐balancing parallel fast multipole library , 2009, ArXiv.

[20]  Eric F Darve Regular ArticleThe Fast Multipole Method: Numerical Implementation , 2000 .

[21]  Lorena A. Barba,et al.  How Will the Fast Multipole Method Fare in the Exascale Era , 2013 .

[22]  Lorena A. Barba,et al.  Characterization of the errors of the FMM in particle simulations , 2008, ArXiv.

[23]  K. Schmidt,et al.  Implementing the fast multipole method in three dimensions , 1991 .

[24]  Yusuke Hagihara Celestial mechanics. Vol.1: Dynamical principles and transformation theory , 1970 .

[25]  Richard W. Vuduc,et al.  A CPU: GPU Hybrid Implementation and Model-Driven Scheduling of the Fast Multipole Method , 2014, GPGPU@ASPLOS.

[26]  Henrik G. Petersen,et al.  Error estimates for the fast multipole method , 1997 .

[27]  Richard W. Vuduc,et al.  A massively parallel adaptive fast-multipole method on heterogeneous architectures , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[28]  Eric F Darve The Fast Multipole Method , 2000 .

[29]  Sotirios G. Ziavras,et al.  Multicore-based vector coprocessor sharing for performance and energy gains , 2013, TECS.