论文信息 - A Domain-Specific Language and Compiler for Stencil Computations on Short-Vector SIMD and GPU Architectures

A Domain-Specific Language and Compiler for Stencil Computations on Short-Vector SIMD and GPU Architectures

Stencil computations are an integral part of applications in a number of scientific computing domains, such as image processing and partial differential equations. We describe a domain-specific language for regular stencil computations, that allows specification of the computations in a concise manner. We describe a multi-target compiler for this DSL, that generates optimized code for multi-core processors with short-vector SIMD engines, as well as GPUs. The hardware differences between these two types of architecture prompt different optimization strategies for the compiler. A data layout transformation along with split tiling is used for multi-core CPUs, while overlapped tiling is used for GPUs. We evaluate our domain-specific compiler for a number of benchmarks on CPU and GPU platforms.

[1] Kevin Skadron,et al. A Performance Study for Iterative Stencil Loops on GPUs with Ghost Zone Optimizations , 2011, International Journal of Parallel Programming.

[2] Jason Cong,et al. Accelerating Fluid Registration Algorithm on Multi-FPGA Platforms , 2011, 2011 21st International Conference on Field Programmable Logic and Applications.

[3] Lei Huang,et al. PADS: A Pattern-Driven Stencil Compiler-Based Tool for Reuse of Optimizations on GPGPUs , 2011, 2011 IEEE 17th International Conference on Parallel and Distributed Systems.

[4] Franz Franchetti,et al. Data Layout Transformation for Stencil Computations on Short-Vector SIMD Architectures , 2011, CC.

[5] Ravindra K. Ahuja,et al. Network Flows: Theory, Algorithms, and Applications , 1993 .

[6] Helmar Burkhart,et al. PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[7] P ? ? ? ? ? ? ? % ? ? ? ? , 1991 .

[8] P. Sadayappan,et al. High-performance code generation for stencil computations on GPU architectures , 2012, ICS '12.

[9] A. Taflove. The Finite-Difference Time-Domain Method , 1995 .

[10] Jason Cong,et al. Lithographic aerial image simulation with FPGA-based hardwareacceleration , 2008, FPGA '08.

[11] G. Smith,et al. Numerical Solution of Partial Differential Equations: Finite Difference Methods , 1978 .

[12] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.

[13] Gerhard Wellein,et al. Efficient multicore-aware parallelization strategies for iterative stencil computations , 2010, J. Comput. Sci..

[14] Albert Cohen,et al. Split tiling for GPUs: automatic parallelization using trapezoidal tiles , 2013, GPGPU@ASPLOS.

[15] Helmar Burkhart,et al. PATUS: A Code Generation and Auto-Tuning Framework For Parallel Stencil Computations , 2011 .

[16] G. Dantzig,et al. FINDING A CYCLE IN A GRAPH WITH MINIMUM COST TO TIME RATIO WITH APPLICATION TO A SHIP ROUTING PROBLEM , 1966 .

[17] Uday Bondhugula,et al. Effective automatic parallelization of stencil computations , 2007, PLDI '07.

[18] W. Marsden. I and J , 2012 .

[19] Ronald L. Rivest,et al. Introduction to Algorithms, Second Edition , 2001 .

[20] Hans-Peter Seidel,et al. Cache oblivious parallelograms in iterative stencil computations , 2010, ICS '10.

[21] Pradeep Dubey,et al. 3.5-D Blocking Optimization for Stencil Computations on Modern CPUs and GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.

[22] Paulius Micikevicius,et al. 3D finite difference computation on GPUs using CUDA , 2009, GPGPU-2.

[23] Ronald L. Rivest,et al. Introduction to Algorithms , 1990 .

[24] Uday Bondhugula,et al. Tiling stencil computations to maximize parallelism , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.

[25] Richard Veras,et al. A stencil compiler for short-vector SIMD architectures , 2013, ICS '13.

[26] Bradley C. Kuszmaul,et al. The pochoir stencil compiler , 2011, SPAA '11.

[27] Samuel Williams,et al. Stencil computation optimization and auto-tuning on state-of-the-art multicore architectures , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.

[28] John B. Shoven,et al. I , Edinburgh Medical and Surgical Journal.