Scalable Fast Multipole Method for Electromagnetic Simulations

To address recent many-core architecture design, HPC applications are exploring hybrid parallel programming, mixing MPI and OpenMP. Among them, very few large scale applications in production today are exploiting asynchronous parallel tasks and asynchronous multithreaded communications to take full advantage of the available concurrency, in particular from dynamic load balancing, network, and memory operations overlapping. In this paper, we present our first results of ML-FMM algorithm implementation using GASPI asynchronous one-sided communications to improve code scalability and performance. On 32 nodes, we show an 83.5% reduction on communication costs over the optimized MPI+OpenMP version.

[1]  Lorena A. Barba,et al.  A tuned and scalable fast multipole method as a preeminent algorithm for exascale systems , 2011, Int. J. High Perform. Comput. Appl..

[2]  Guillaume Sylvand La méthode multipôle rapide en électromagnétisme. Performances, parallélisation, applications , 2002 .

[3]  David E. Keyes,et al.  A Performance Model for the Communication in Fast Multipole Methods on HPC Platforms , 2014, ArXiv.

[4]  David E. Keyes,et al.  Communication Complexity of the Fast Multipole Method and its Algebraic Variants , 2014, Supercomput. Front. Innov..

[5]  David E. Keyes,et al.  Asynchronous Execution of the Fast Multipole Method Using Charm++ , 2014, ArXiv.

[6]  Jens Jägersküpper,et al.  A PGAS-based Implementation for the Unstructured CFD Solver TAU , 2011 .

[7]  Alistair P. Rendell,et al.  PGAS‐FMM: Implementing a distributed fast multipole method using the X10 programming language , 2014, Concurr. Comput. Pract. Exp..

[8]  Hans Hagen,et al.  Fast implicit KD-trees: accelerated isosurface ray tracing and maximum intensity projection for large scalar fields , 2007 .

[9]  Richard W. Vuduc,et al.  Diagnosis, Tuning, and Redesign for Multicore Performance: A Case Study of the Fast Multipole Method , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[10]  Richard W. Vuduc,et al.  A massively parallel adaptive fast-multipole method on heterogeneous architectures , 2009, Proceedings of the Conference on High Performance Computing Networking, Storage and Analysis.

[11]  Emmanuel Agullo,et al.  Task‐based FMM for heterogeneous architectures , 2016, Concurr. Comput. Pract. Exp..

[12]  Hatem Ltaief,et al.  Data‐driven execution of fast multipole methods , 2012, Concurr. Comput. Pract. Exp..

[13]  Matthew G. Knepley,et al.  PetFMM—A dynamically load‐balancing parallel fast multipole library , 2009, ArXiv.

[14]  Emmanuel Agullo,et al.  Task-Based FMM for Multicore Architectures , 2014, SIAM J. Sci. Comput..

[15]  Torsten Hoefler,et al.  Message progression in parallel computing - to thread or not to thread? , 2008, 2008 IEEE International Conference on Cluster Computing.