论文信息 - Scaling the GCR Solver Using a High-Level Stencil Framework on Multi- and Many-Core Architectures

Scaling the GCR Solver Using a High-Level Stencil Framework on Multi- and Many-Core Architectures

The recent advent of novel multi- and many-core architectures forces application programmers to deal with hardware-specific implementation details and to be familiar with software optimization techniques to benefit from new high-performance computing machines. An extra care must be taken for communication-intensive algorithms, which may be a bottleneck for forthcoming era of exascale computing. This paper aims to present a high level stencil framework implemented for the EULAG model that efficiently utilizes heterogeneous clusters. Only an efficient usage of both CPUs and GPUs with the flexible data decomposition method can lead to the maximum performance that scales communication-intensive elliptic solver with preconditioner.

Krzysztof Kurowski | Piotr Kopta | Milosz Ciznicki | Michal Kulczewski

[1] Luís Fabrício Wanderley Góes,et al. PSkel: A stencil programming framework for CPU‐GPU systems , 2015, Concurr. Comput. Pract. Exp..

[2] Chao Yang,et al. Enabling and Scaling a Global Shallow-Water Atmospheric Model on Tianhe-2 , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.

[3] Pawel Gepner,et al. Adaptation of MPDATA Heterogeneous Stencil Computation to Intel Xeon Phi Coprocessor , 2015, Sci. Program..

[4] Lukasz Szustak,et al. Adaptation of fluid model EULAG to graphics processing unit architecture , 2015, Concurr. Comput. Pract. Exp..

[5] Murray Cole,et al. PARTANS: An autotuning framework for stencil computation on multi-GPU systems , 2013, TACO.

[6] Pawel Gepner,et al. Elliptic Solver Performance Evaluation on Modern Hardware Architectures , 2013, PPAM.

[7] Satoshi Matsuoka,et al. Physis: An implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[8] Laxmikant V. Kale,et al. Accelerator Support in the Charm++ Parallel Programming Model. , 2010 .

[9] J. Prusa,et al. EULAG, a computational model for multiscale flows , 2008 .

[10] Lukasz Szustak,et al. Parallelization of 2D MPDATA EULAG algorithm on hybrid architectures with GPU accelerators , 2014, Parallel Comput..

[11] P. K. Smolarkiewicz,et al. VARIATIONAL METHODS FOR ELLIPTIC PROBLEMS IN FLUID MODELS , 2000 .

[12] Helmar Burkhart,et al. PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[13] Jack Dongarra,et al. Scientific Computing with Multicore and Accelerators , 2010, Chapman and Hall / CRC computational science series.

[14] Michal Kierzynka,et al. From physics model to results: An optimizing framework for cross-architecture code generation , 2013 .

[15] Hiroshi Okuda,et al. Conjugate gradients on multiple GPUs , 2010 .

[16] Samuel Williams,et al. An auto-tuning framework for parallel multicore stencil computations , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).