Integrating Multi-threading and Accelerators into DUNE-ISTL

A major challenge in PDE software is the balance between user-level flexibility and performance on heterogeneous hardware. We discuss our ideas on how this challenge can be tackled, exemplarily for the DUNE framework and in particular its linear algebra and solver components. We demonstrate how the former MPI-only implementation is modified to support MPI+[CPU/GPU] threading and vectorisation. To this end, we devise a novel block extension of the recently proposed SELL-C-σ format. The efficiency of our approach is underlined by benchmark computations that exhibit reasonable speedups over the CPU-MPI-only case.

[1]  Gerhard Wellein,et al.  A Unified Sparse Matrix Data Format for Efficient General Sparse Matrix-Vector Multiplication on Modern Processors with Wide SIMD Units , 2013, SIAM J. Sci. Comput..

[2]  Martin Kronbichler,et al.  Algorithms and data structures for massively parallel generic adaptive finite element codes , 2011, ACM Trans. Math. Softw..

[3]  A. Ern,et al.  A discontinuous Galerkin method with weighted averages for advection–diffusion equations with locally small and anisotropic diffusivity , 2008 .

[4]  Richard W. Vuduc,et al.  Model-driven autotuning of sparse matrix-vector multiply on GPUs , 2010, PPoPP '10.

[5]  Gerhard Wellein,et al.  A unified sparse matrix data format for modern processors with wide SIMD units , 2013, ArXiv.

[6]  Tamara G. Kolda,et al.  An overview of the Trilinos project , 2005, TOMS.

[7]  Anders Logg,et al.  Automated Solution of Differential Equations by the Finite Element Method: The FEniCS Book , 2012 .

[8]  Andreas Dedner,et al.  A generic grid interface for parallel and adaptive scientific computing. Part II: implementation and tests in DUNE , 2008, Computing.

[9]  Jeremy G. Siek,et al.  A Modern Framework for Portable High-Performance Numerical Linear Algebra , 2000 .

[10]  Kurt Keutzer,et al.  clSpMV: A Cross-Platform OpenCL SpMV Framework on GPUs , 2012, ICS '12.

[11]  Dirk Pflüger,et al.  Lecture Notes in Computational Science and Engineering , 2010 .

[12]  Stefan Turek,et al.  FEAST—realization of hardware-oriented numerics for HPC simulations with finite elements , 2010, ISC 2010.

[13]  Andreas Dedner,et al.  A generic grid interface for parallel and adaptive scientific computing. Part I: abstract framework , 2008, Computing.