论文信息 - PATUS: A Code Generation and Auto-Tuning Framework For Parallel Stencil Computations

PATUS: A Code Generation and Auto-Tuning Framework For Parallel Stencil Computations

PATUS is a code generation and auto-tuning framework for stencil computations targeted at modern multiand many-core processors, such as multicore CPUs and graphics processing units. Its ultimate goals are to provide a means towards productivity and performance on current and future multiand many-core platforms. The framework generates the code for a compute kernel from a specification of the stencil operation and a Strategy: a description of the parallelization and optimization methods to be applied. We leverage the auto-tuning methodology to find the optimal hardware architecture-specific and Strategy-specific parameter configuration.

Helmar Burkhart | Matthias Christen | Olaf Schenk

[1] Chun Chen,et al. Loop Transformation Recipes for Code Generation and Auto-Tuning , 2009, LCPC.

[2] Samuel Williams,et al. An auto-tuning framework for parallel multicore stencil computations , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[3] Rudolf Eigenmann,et al. PEAK—a fast and effective performance tuning system via compiler optimization orchestration , 2008, TOPL.

[4] Chau-Wen Tseng,et al. Tiling Optimizations for 3D Scientific Computations , 2000, ACM/IEEE SC 2000 Conference (SC'00).

[5] Bradley C. Kuszmaul,et al. The pochoir stencil compiler , 2011, SPAA '11.

[6] Zhiyuan Li,et al. Automatic tiling of iterative stencil loops , 2004, TOPL.

[7] Sanjay V. Rajopadhye,et al. Parameterized tiled loops for free , 2007, PLDI '07.

[8] Dhabaleswar K. Panda,et al. Scalable Earthquake Simulation on Petascale Supercomputers , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[9] Kevin Skadron,et al. A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations , 2011, International Journal of Parallel Programming.

[10] Zhiyuan Li,et al. New tiling techniques to improve cache temporal locality , 1999, PLDI '99.

[11] Uday Bondhugula,et al. A practical automatic polyhedral parallelizer and locality optimizer , 2008, PLDI '08.

[12] Rudolf Eigenmann,et al. Cetus: A Source-to-Source Compiler Infrastructure for Multicores , 2009, Computer.

[13] Helmar Burkhart,et al. PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[14] Peter Messmer,et al. Parallel data-locality aware stencil computations on modern micro-architectures , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.

[15] Robert Strzodka,et al. Time skewing made simple , 2011, PPoPP '11.

[16] Gerhard Wellein,et al. Efficient Temporal Blocking for Stencil Computations by Multicore-Aware Wavefront Parallelization , 2009, 2009 33rd Annual IEEE International Computer Software and Applications Conference.

[17] Volker Strumpen,et al. Cache oblivious stencil computations , 2005, ICS '05.

[18] Scott B. Baden,et al. Mint: realizing CUDA performance in 3D stencil methods with annotated C , 2011, ICS '11.

[19] Zhiyuan Li,et al. A Compiler Framework for Tiling Imperfectly-Nested Loops , 1999, LCPC.

[20] Samuel Williams,et al. Optimization and Performance Modeling of Stencil Computations on Modern Microprocessors , 2007, SIAM Rev..