论文信息 - Optimization and Parallelization of Emedge3D on Shared Memory Architecture

Optimization and Parallelization of Emedge3D on Shared Memory Architecture

This paper presents a study of techniques used to speedup a scientific simulation code. The techniques include sequential optimizations as well as the parallelization with OpenMP. This work is carried out on two different multicore shared memory architectures, namely a cutting edge 8×8 core CPU and a more common 2×6 core board. Our target application is representative of many memory bound codes, and the techniques we present show how to overcome the burden of the memory bandwidth limit, which is quickly reached on multi-core or many-core with shared memory architectures. To achieve efficient speedups, strategies are applied to lower the computation costs, and to maximize the use of processors caches. Optimizations are: minimizing memory accesses, simplifying and reordering computations, and tiling loops. On 12 cores processor Intel X5675, aggregation of these optimizations results in an execution time 21.6 faster, compared to the original version on one core.

Guillaume Latu | Matthieu Kuhn | Nicolas Crouseilles | Stéphane Genaud

[1] X. Garbet,et al. Nonlinear dynamics of transport barrier relaxations in tokamak edge plasmas. , 2005, Physical review letters.

[2] Sally A. McKee,et al. Bounds on Memory Bandwidth in Streamed Computations , 1995, Euro-Par.

[3] A. Arakawa. Computational design for long-term numerical integration of the equations of fluid motion: two-dimen , 1997 .

[4] Monica S. Lam,et al. A data locality optimizing algorithm , 1991, PLDI '91.

[5] X. Garbet,et al. Penetration of Resonant Magnetic Perturbations at the Tokamak Edge , 2011 .

[6] George Ho,et al. PAPI: A Portable Interface to Hardware Performance Counters , 1999 .

[7] X Garbet,et al. Nonlinear dynamics of magnetic islands imbedded in small-scale turbulence. , 2009, Physical review letters.

[8] Leonid Oliker,et al. Revolutionary technologies for acceleration of emerging petascale applications , 2009, Parallel Comput..

[9] X. Garbet,et al. Evidence from numerical simulations of transport-barrier relaxations in tokamak edge plasmas in the presence of electromagnetic fluctuations. , 2008, Physical review letters.

[10] Samuel Williams,et al. Optimization of sparse matrix-vector multiplication on emerging multicore platforms , 2007, Proceedings of the 2007 ACM/IEEE Conference on Supercomputing (SC '07).

[11] Steven G. Johnson,et al. The Design and Implementation of FFTW3 , 2005, Proceedings of the IEEE.