PetFMM—A dynamically load‐balancing parallel fast multipole library

Fast algorithms for the computation of N-body problems can be broadly classified into mesh-based interpolation methods, and hierarchical or multiresolution methods. To this latter class belongs the well-known fast multipole method (FMM), which offers (N) complexity. The FMM is a complex algorithm, and the programming difficulty associated with it has arguably diminished its impact, being a barrier for adoption. This paper presents an extensible parallel library for N-body interactions utilizing the FMM algorithm. A prominent feature of this library is that it is designed to be extensible, with a view to unifying efforts involving many algorithms based on the same principles as the FMM and enabling easy development of scientific application codes. The paper also details an exhaustive model for the computation of tree-based N-body algorithms in parallel, including both work estimates and communications estimates. With this model, we are able to implement a method to provide automatic, a priori load balancing of the parallel execution, achieving optimal distribution of the computational work among processors and minimal inter-processor communications. Using a client application that performs the calculation of velocity induced by N vortex particles in two dimensions, ample verification and testing of the library was performed. Strong scaling results are presented with 10 million particles on up to 256 processors, including both speedup and parallel efficiency. The largest problem size that has been run with the PetFMM library at this point was 64 million particles in 64 processors. The library is currently able to achieve over 85% parallel efficiency for 64 processes. The performance study, computational model, and application demonstrations presented in this paper are limited to 2D. However, the software architecture was designed to make an extension of this work to 3D straightforward, as the framework is templated over the dimension. The software library is open source under the PETSc license, even less restrictive than the BSD license; this guarantees the maximum impact to the scientific community and encourages peer-based collaboration for the extensions and applications. Copyright © 2010 John Wiley & Sons, Ltd.

[1]  William Gropp,et al.  A Parallel Version of the Fast Multipole Method-Invited Talk , 1987, PPSC.

[2]  L. Barba,et al.  Advances in viscous vortex methods—meshless spatial adaption based on radial basis function interpolation , 2005 .

[3]  Lorena A. Barba,et al.  Characterization of the errors of the FMM in particle simulations , 2008, ArXiv.

[4]  Michael S. Warren,et al.  A parallel hashed oct-tree N-body algorithm , 1993, Supercomputing '93. Proceedings.

[5]  Rupak Biswas,et al.  Special Issue on Dynamic Load Balancing: Guest Editors' Introduction , 1997, J. Parallel Distributed Comput..

[6]  P. Havé A parallel implementation of the fast multipole method for Maxwell's equations , 2003 .

[7]  M. Karplus,et al.  CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .

[8]  J. CARRIERt,et al.  A FAST ADAPTIVE MULTIPOLE ALGORITHM FOR PARTICLE SIMULATIONS * , 2022 .

[9]  M. S. Warren,et al.  A parallel hashed Oct-Tree N-body algorithm , 1993, Supercomputing '93.

[10]  Eric Darve,et al.  The black-box fast multipole method , 2009, J. Comput. Phys..

[11]  Piet Hut,et al.  A hierarchical O(N log N) force-calculation algorithm , 1986, Nature.

[12]  Lorena A. Barba,et al.  Emergence and evolution of tripole vortices from net-circulation initial conditions , 2007 .

[13]  Leslie Greengard,et al.  A fast algorithm for particle simulations , 1987 .

[14]  Jakub Kurzak,et al.  Massively parallel implementation of a fast multipole method for distributed memory machines , 2005, J. Parallel Distributed Comput..

[15]  Matthew G. Knepley,et al.  Mesh algorithms for PDE with Sieve I: Mesh distribution , 2009, Sci. Program..

[16]  Rajiv K. Kalia,et al.  Scalable and portable implementation of the fast multipole method on parallel computers , 2003 .

[17]  Robert A. van de Geijn,et al.  Collective communication: theory, practice, and experience , 2007, Concurr. Comput. Pract. Exp..

[18]  Lexing Ying,et al.  A New Parallel Kernel-Independent Fast Multipole Method , 2003, ACM/IEEE SC 2003 Conference (SC'03).

[19]  Laxmikant V. Kalé,et al.  Scalable molecular dynamics with NAMD , 2005, J. Comput. Chem..

[20]  Vipin Kumar,et al.  A Parallel Algorithm for Multilevel Graph Partitioning and Sparse Matrix Ordering , 1998, J. Parallel Distributed Comput..

[21]  Robert D. Skeel,et al.  Multiple grid methods for classical molecular dynamics , 2002, J. Comput. Chem..

[22]  John A. Board,et al.  Efficient parallel implementations of multipole based n-body algorithms , 1999 .

[23]  D. Zorin,et al.  A kernel-independent adaptive fast multipole algorithm in two and three dimensions , 2004 .

[24]  Shang-Hua Teng,et al.  Provably Good Partitioning and Load Balancing Algorithms for Parallel Adaptive N-Body Simulation , 1998, SIAM J. Sci. Comput..

[25]  Zydrunas Gimbutas,et al.  A Generalized Fast Multipole Method for Nonoscillatory Kernels , 2003, SIAM J. Sci. Comput..

[26]  R W Hockney,et al.  Computer Simulation Using Particles , 1966 .

[27]  Andrew W. Appel,et al.  An Efficient Program for Many-Body Simulation , 1983 .

[28]  Christopher R. Anderson,et al.  An Implementation of the Fast Multipole Method without Multipoles , 1992, SIAM J. Sci. Comput..