Roofline Analysis and Performance Optimization of the MGB Hydrological Model

The Roofline model gives insights about the performance behavior of applications bounded by either memory or processor limits, providing useful guidelines for performance improvements. This work uses the Roofline model on the analysis of the MGB model that simulates hydrological processes in largescale watersheds. Real-world input data are used to characterize the performance on two multicore architectures, one with only CPUs and one with CPUs/GPU. The MGB model performance is improved with optimizations for better memory use, and also with shared-memory (OpenMP) and GPU (OpenACC) parallelism. CPU performance achieves 42.51 % and 50.17 % of each system’s peak, whereas GPU performance is low due to overheads caused by the MGB model structure.

[1]  Gerhard Wellein,et al.  Multicore Performance Engineering of Sparse Triangular Solves Using a Modified Roofline Model , 2018, 2018 30th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD).

[2]  Walter Collischonn Simulação Hidrológica de Grandes Bacias , 2001 .

[3]  Jack J. Dongarra,et al.  Collecting Performance Data with PAPI-C , 2009, Parallel Tools Workshop.

[4]  L. Dagum,et al.  OpenMP: an industry standard API for shared-memory programming , 1998 .

[5]  Walter Collischonn,et al.  Avaliação de um método de propagação de cheias em rios com aproximação inercial das equações de Saint-Venant , 2014 .

[6]  R. Paiva,et al.  Large scale hydrologic and hydrodynamic modeling using limited data and a GIS based approach , 2011 .

[7]  Frederico Pratas,et al.  Cache-aware Roofline model: Upgrading the loft , 2014, IEEE Computer Architecture Letters.

[8]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[9]  Christian Terboven,et al.  OpenACC - First Experiences with Real-World Applications , 2012, Euro-Par.

[10]  P. V. Beukering,et al.  The Niger, a Lifeline: Effective Water Management in the Upper Niger Basin , 2005 .

[11]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[12]  Romain Dolbeau Bull Theoretical Peak FLOPS per instruction set on modern Intel CPUs , 2016 .

[13]  Thorsten Kurth,et al.  Hierarchical Roofline analysis for GPUs: Accelerating performance optimization for the NERSC‐9 Perlmutter system , 2020, Concurr. Comput. Pract. Exp..

[14]  Mark D. Hill,et al.  Amdahl's Law in the Multicore Era , 2008 .