Towards a performance-portable description of geometric multigrid algorithms using a domain-specific language

Abstract High Performance Computing (HPC) systems are nowadays more and more heterogeneous. Different processor types can be found on a single node including accelerators such as Graphics Processing Units (GPUs). To cope with the challenge of programming such complex systems, this work presents a domain-specific approach to automatically generate code tailored to different processor types. Low-level CUDA and OpenCL code is generated from a high-level description of an algorithm specified in a Domain-Specific Language (DSL) instead of writing hand-tuned code for GPU accelerators. The DSL is part of the Heterogeneous Image Processing Acceleration ( HIPA cc ) framework and was extended in this work to handle grid hierarchies in order to model different cycle types. Language constructs are introduced to process and represent data at different resolutions. This allows to describe image processing algorithms that work on image pyramids as well as multigrid methods in the stencil domain. By decoupling the algorithm from its schedule, the proposed approach allows to generate efficient stencil code implementations. Our results show that similar performance compared to hand-tuned codes can be achieved.

[1]  P. Sadayappan,et al.  High-performance code generation for stencil computations on GPU architectures , 2012, ICS '12.

[2]  Andreas Dedner,et al.  A generic grid interface for parallel and adaptive scientific computing. Part I: abstract framework , 2008, Computing.

[3]  Dani Lischinski,et al.  Gradient Domain High Dynamic Range Compression , 2023 .

[4]  Frédo Durand,et al.  Decoupling algorithms from schedules for easy optimization of image processing pipelines , 2012, ACM Trans. Graph..

[5]  Harald Köstler,et al.  Performance engineering to achieve real-time high dynamic range imaging , 2012, Journal of Real-Time Image Processing.

[6]  Satoshi Matsuoka,et al.  Physis: An implicitly parallel programming model for stencil computations on large-scale GPU-accelerated supercomputers , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[7]  Jan Kautz,et al.  Local Laplacian filters: edge-aware image processing with a Laplacian pyramid , 2011, SIGGRAPH 2011.

[8]  Jürgen Teich,et al.  Code generation for embedded heterogeneous architectures on android , 2014, 2014 Design, Automation & Test in Europe Conference & Exhibition (DATE).

[9]  Alan Mycroft,et al.  Ypnos: declarative, parallel structured grid programming , 2010, DAMP '10.

[10]  Jack J. Dongarra,et al.  From CUDA to OpenCL: Towards a performance-portable solution for multi-platform GPU programming , 2012, Parallel Comput..

[11]  Ulrich Rüde,et al.  Modeling Multigrid Algorithms for Variational Imaging , 2010, 2010 21st Australian Software Engineering Conference.

[12]  Helmar Burkhart,et al.  PATUS: A Code Generation and Autotuning Framework for Parallel Iterative Stencil Computations on Modern Microarchitectures , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.

[13]  Samuel Williams,et al.  An auto-tuning framework for parallel multicore stencil computations , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).

[14]  Jürgen Teich,et al.  Automatic Optimization of In-Flight Memory Transactions for GPU Accelerators Based on a Domain-Specific Language for Medical Imaging , 2012, 2012 11th International Symposium on Parallel and Distributed Computing.

[15]  Robert D. Falgout,et al.  Scaling Hypre's Multigrid Solvers to 100, 000 Cores , 2011, High-Performance Scientific Computing.

[16]  Takayuki Muranushi,et al.  Paraiso : An Automated Tuning Framework for Explicit Solvers of Partial Differential Equations , 2012, ArXiv.

[17]  Ulrich Rüde,et al.  A fast full multigrid solver for applications in image processing , 2008, Numer. Linear Algebra Appl..

[18]  Wolfgang Hackbusch,et al.  Multi-grid methods and applications , 1985, Springer series in computational mathematics.

[19]  William L. Briggs,et al.  A multigrid tutorial, Second Edition , 2000 .

[20]  D. Brandt,et al.  Multi-level adaptive solutions to boundary-value problems math comptr , 1977 .

[21]  Jürgen Teich,et al.  Towards Domain-Specific Computing for Stencil Codes in HPC , 2012, 2012 SC Companion: High Performance Computing, Networking Storage and Analysis.

[22]  Jürgen Teich,et al.  Generating Device-specific GPU Code for Local Operators in Medical Imaging , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.

[23]  Uday Bondhugula,et al.  A compiler framework for optimization of affine loop nests for gpgpus , 2008, ICS '08.

[24]  Benoît Meister,et al.  A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction , 2010, GPGPU-3.

[25]  Jürgen Teich,et al.  Mastering Software Variant Explosion for GPU Accelerators , 2012, Euro-Par Workshops.

[26]  Eric Darve,et al.  Liszt: A domain specific language for building portable mesh-based PDE solvers , 2011, 2011 International Conference for High Performance Computing, Networking, Storage and Analysis (SC).

[27]  Ulrich Rüde,et al.  Optimising a 3D multigrid algorithm for the IA-64 architecture , 2008, Int. J. Comput. Sci. Eng..

[28]  Bradley C. Kuszmaul,et al.  The pochoir stencil compiler , 2011, SPAA '11.

[29]  Paul Feautrier,et al.  Polyhedron Model , 2011, Encyclopedia of Parallel Computing.

[30]  Edward H. Adelson,et al.  The Laplacian Pyramid as a Compact Image Code , 1983, IEEE Trans. Commun..

[31]  Til Aach,et al.  Nonlinear multiresolution gradient adaptive filter for medical images , 2003, SPIE Medical Imaging.