Accelerating Stencil-Based Computations by Increased Temporal Locality on Modern Multi- and Many-Core Architectures

Stencil computations arise in a wide range of applications of computational sciences. This paper focuses on stencil computations arising in the context of a biomedical simulation. Compute-intensive bio-medical simulations represent an attractive application for the Cell Broadband Engine Architecture (CBEA) and for graphics processing units (GPUs) as hardware accelerators. Due to the low arithmetic intensity of stencil computations and bandwidth limitations of the compute hardware, the performance is usually only a fraction of peak performance. We detail an implementation of parallel stencil computations on the CBEA and GPUs, which improves performance by exploiting temporal locality. We report on performance improvements over CPU implementations.

[1]  Zhiyuan Li,et al.  A Compiler Framework for Tiling Imperfectly-Nested Loops , 1999, LCPC.

[2]  Zhiyuan Li,et al.  Automatic tiling of iterative stencil loops , 2004, TOPL.

[3]  Chau-Wen Tseng,et al.  Tiling Optimizations for 3D Scientific Computations , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[4]  E. Neufeld,et al.  The HYPERcollar: A novel applicator for hyperthermia in the head and neck , 2007 .

[5]  Siddhartha Chatterjee,et al.  Cache-Efficient Multigrid Algorithms , 2001, Int. J. High Perform. Comput. Appl..

[6]  Samuel Williams,et al.  Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..

[7]  Volker Strumpen,et al.  Cache oblivious stencil computations , 2005, ICS '05.

[8]  Graham Pullan,et al.  Acceleration of a two-dimensional Euler flow solver using commodity graphics hardware , 2007 .

[9]  David G. Wonnacott,et al.  Time Skewing for Parallel Computers , 1999, LCPC.

[10]  J. Zee,et al.  Heating the patient: a promising approach? , 2002 .

[11]  H. H. Pennes Analysis of tissue and arterial blood temperatures in the resting human forearm. 1948. , 1948, Journal of applied physiology.

[12]  Ulrich Rüde,et al.  Cache-Aware Multigrid Methods for Solving Poisson's Equation in Two Dimensions , 2000, Computing.

[13]  Samuel Williams,et al.  Scientific computing Kernels on the cell processor , 2007 .

[14]  Volker Strumpen,et al.  Exploiting communication latency hiding for parallel network computing: model and analysis , 1994, Proceedings of 1994 International Conference on Parallel and Distributed Systems.

[15]  Graham Pullan,et al.  Acceleration of a 3D Euler solver using commodity graphics hardware , 2008 .

[16]  Samuel Williams,et al.  The Landscape of Parallel Computing Research: A View from Berkeley , 2006 .