Porting a sparse linear algebra math library to Intel GPUs

With the announcement that the Aurora Supercomputer will be composed of general purpose Intel CPUs complemented by discrete high performance Intel GPUs, and the deployment of the oneAPI ecosystem, Intel has committed to enter the arena of discrete high performance GPUs. A central requirement for the scientific computing community is the availability of production-ready software stacks and a glimpse of the performance they can expect to see on Intel high performance GPUs. In this paper, we present the first platform-portable open source math library supporting Intel GPUs via the DPC++ programming environment. We also benchmark some of the developed sparse linear algebra functionality on different Intel GPUs to assess the efficiency of the DPC++ programming ecosystem to translate raw performance into application performance. Aside from quantifying the efficiency within the hardware-specific roofline model, we also compare against routines providing the same functionality that ship with Intel’s oneMKL vendor library.

[1]  Yousef Saad,et al.  Iterative methods for sparse linear systems , 2003 .

[2]  Hartwig Anzt,et al.  Preparing Ginkgo for AMD GPUs - A Testimonial on Porting CUDA Code to HIP , 2020, ArXiv.

[3]  Terry Cojean,et al.  Ginkgo: A high performance numerical linear algebra library , 2020, J. Open Source Softw..

[4]  Enrique S. Quintana-Ortí,et al.  Towards Continuous Benchmarking: An Automated Performance Evaluation Framework for High Performance Software , 2019, PASC.

[5]  Matt Martineau,et al.  Evaluating attainable memory bandwidth of parallel programming models via BabelStream , 2018, Int. J. Comput. Sci. Eng..

[6]  Yannis Cotronis,et al.  A quantitative roofline model for GPU kernel performance estimation using micro-benchmarks and hardware metric profiling , 2017, J. Parallel Distributed Comput..

[7]  Y. Saad,et al.  GMRES: a generalized minimal residual algorithm for solving nonsymmetric linear systems , 1986 .

[8]  Samuel Williams,et al.  Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.

[9]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[10]  Hartwig Anzt,et al.  Ginkgo - A Math Library designed for Platform Portability , 2020, Parallel Comput..

[11]  Enrique S. Quintana-Ortí,et al.  Ginkgo: A Modern Linear Operator Algebra Framework for High Performance Computing , 2020, ACM Trans. Math. Softw..

[12]  Enrique S. Quintana-Ortí,et al.  Adaptive Precision Block-Jacobi for High Performance Preconditioning in the Ginkgo Linear Algebra Software , 2021, ACM Trans. Math. Softw..

[13]  Ronan Keryell,et al.  Khronos SYCL for OpenCL: a tutorial , 2015, IWOCL.