Spherical harmonic transform on heterogeneous architectures using hybrid programming

Th`eme NUM — Syst`emes num´eriquesEquipe-Projet Grand-large´Rapport de recherche n° 7635 — 15 April 2011 — 14 pagesAbstract: Spherical Harmonic Transforms (SHT) are at the heart of many scientific and practical ap-plications ranging from climate modeling to cosmological observations. In many of these areas a new waveof exciting, cutting-edge science goals have been recently proposed calling for simulations and analyses ofactual experimental or observational data at very high resolutions, accompanied by producing or processingunprecedented volumes of the data. Both these aspects pose formidable challenge for the currently existingimplementations of the transforms.This paper describes a multi CPU-GPUs implementation of an inverse SHT, based on hybrid program-ming combining MPI and CUDA, and discusses its tests as motivated by these forthcoming applications.We present performance comparisons of the multi GPU version and a hybrid, MPI/OpenMP version ofthe same transform. We find that one NVIDIA Tesla S1070 can accelerate overall execution time of theSHT by as much as 3 times with respect to the MPI/OpenMP version executed on one quad-core processor(Intel Nehalem 2.93 GHz) and, owing to very good scalability of both versions, 128 Tesla cards perform asgood as 256 twelve-core processor (AMD Opteron 2.1 GHz).Key-words: Spherical Harmonic Transforms, hybrid architectures, hybrid programming, OpenMP,CUDA, Multi-GPU