论文信息 - Optimized Stencil Computation Using In-Place Calculation on Modern Multicore Systems

Optimized Stencil Computation Using In-Place Calculation on Modern Multicore Systems

Numerical algorithms on parallel systems built upon modern multicore processors are facing two challenging obstacles that keep realistic applications from reaching the theoretically available compute performance. First, the parallelization on several system levels has to be exploited to the full extent. Second, provision of data to the compute cores needs to be adapted to the constraints of a hardware-controlled nested cache hierarchy with shared resources. In this paper we analyze dedicated optimization techniques on modern multicore systems for stencil kernels on regular three-dimensional grids. We combine various methods like a compressed grid algorithm with finite shifts in each time step and loop skewing into an optimized parallel in-place stencil implementation of the three-dimensional Laplacian operator. In that context, memory requirements are reduced by a factor of approximately two while considerable performance gains are observed on modern Intel and AMD based multicore systems.

[1] Ulrich Rüde,et al. Optimization and Profiling of the Cache Performance of Parallel Lattice Boltzmann Codes in 2 D and 3 D ∗ , 2003 .

[2] Katherine Yelick,et al. OSKI: A library of automatically tuned sparse matrix kernels , 2005 .

[3] Yuefan Deng,et al. New trends in high performance computing , 2001, Parallel Computing.

[4] Samuel Williams,et al. Implicit and explicit optimizations for stencil computations , 2006, MSPC '06.

[5] Samuel Williams,et al. Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..

[6] David G. Wonnacott,et al. Time Skewing for Parallel Computers , 1999, LCPC.

[7] Samuel Williams,et al. The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .

[8] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.