Using heterogeneous computing for scattering prediction in scenarios with several source configurations

In this work, we present a tool for solving large scattering problems with several acoustic source configurations. These problems entail a large matrix multiplication where the matrices must be generated on demand so that problems can be solved using systems with less memory than that required to store the whole matrices. We have analysed and developed different versions: one based on multiple matrix-vector products, two different approaches built on tiled matrix multiplication, and one heterogeneous implementation for using a GPU and a Xeon Phi simultaneously. To test these implementations, we have used different devices: multicore CPUs, a Xeon Phi accelerator, and a Tesla GPU. When compared to our initial work, the peak speedup of the new solutions is $$25\times $$25× for CPU, $$17\times $$17× for Phi, $$20\times $$20× for GPU, and $$20\times $$20× for the heterogeneous GPU + Phi implementation. Finally, it is worth mentioning that the tool presented in this work can be adapted and applied to other fields whenever the problem to solve requires a large matrix multiplication where the elements must be generated on demand (e.g. the inverse scattering problem in electromagnetics).

[1]  O. Kilic,et al.  Fast Multipole Method for Large-Scale Electromagnetic Scattering Problems on GPU Cluster and FPGA-Accelerated Platforms , 2014 .

[2]  Fang Q. Hu,et al.  An efficient solution of time domain boundary integral equations for acoustic scattering and its acceleration by Graphics Processing Units , 2013 .

[3]  Jian-Ming Jin,et al.  An OpenMP-CUDA Implementation of Multilevel Fast Multipole Algorithm for Electromagnetic Simulation on Multi-GPU Computing Systems , 2013, IEEE Transactions on Antennas and Propagation.

[4]  Vinh Dang,et al.  Parallelizing Fast Multipole Method for Large-Scale Electromagnetic Problems Using GPU Clusters , 2013, IEEE Antennas and Wireless Propagation Letters.

[5]  Fernando Las-Heras,et al.  Geometry Reconstruction of Metallic Bodies Using the Sources Reconstruction Method , 2010, IEEE Antennas and Wireless Propagation Letters.

[6]  Martin Ochmann,et al.  Boundary Element Acoustics Fundamentals and Computer Codes , 2002 .

[7]  David A. Wood,et al.  Cache profiling and the SPEC benchmarks: a case study , 1994, Computer.

[8]  Eric L. Miller,et al.  Multiple-incidence and multifrequency for profile reconstruction of random rough surfaces using the 3-D electromagnetic fast multipole model , 2004, IEEE Transactions on Geoscience and Remote Sensing.

[9]  Timothy A. Davis,et al.  The university of Florida sparse matrix collection , 2011, TOMS.

[10]  Dennis Gannon,et al.  Strategies for cache and local memory management by global program transformation , 1988, J. Parallel Distributed Comput..

[11]  José Ranilla,et al.  Aircraft noise scattering prediction using different accelerator architectures , 2014, The Journal of Supercomputing.

[12]  José Ranilla,et al.  Parallelization of the FMM on distributed-memory GPGPU systems for acoustic-scattering prediction , 2012, The Journal of Supercomputing.

[13]  José Ranilla,et al.  Acoustic scattering solver based on single level FMM for multi-GPU systems , 2012, J. Parallel Distributed Comput..

[14]  Robert A. van de Geijn,et al.  A Runtime System for Programming Out-of-Core Matrix Algorithms-by-Tiles on Multithreaded Architectures , 2012, TOMS.