Task Parallelism in the WRF Model Through Computation Offloading to Many-Core Devices

In the last decade the use of hybrid hardware (e.g., multicore processors + coprocessors) has been growing on the HPC field. However, this evolution in the HPC hardware has not been fully exploited by the WRF model since it shows limitations in the scalability when a large number of computing units are used. In a previous work, we proposed an asynchronous architecture for the WRF that overlaps the radiation computation with the execution of the rest of the model. In this work, we extend this idea with the aim of exploiting the computational power offered by hybrid hardware platforms. Specifically, we implement an OpenMP version of the asynchronous architecture and include the use of two types of coprocessors, a Xeon Phi and a GPU. The experimental evaluation performed shows that our proposal is able to adequately exploit these secondary computation devices, reaching interesting runtime reductions when solving tests cases from real scenarios.