SYCL is a single-source programming model for heterogeneous systems; it promises improved maintainability, productivity, and opportunity for compiler optimization, when compared to accelerator specific programming models. Several implementations of the SYCL standard have been developed over the past few years, including several backends using contemporary accelerator languages, like OpenCL, CUDA, and HIP. These implementations vary widely in their support for specific features of the standard and in their performance. As SYCL grows in popularity, developers need to know how features are implemented across popular implementations in order to make proper design choices. In this paper, we evaluate the existing SYCL implementations for important SYCL features across a range of hardware in order to understand SYCL’s performance and portability. This work uses the newest SYCL benchmark suite (SYCL-Bench, 38 kernels) to evaluate these four existing implementations, comparing support of language features across backends and highlighting feature completeness and performance. For features, we focus on the five major SYCL parallel constructs, using a motivating example of the matrix multiplication benchmark. Our results show that the basic data parallelism construct is the best choice for performance on current SYCL implementations, and we identify opportunities for improvement in several of the SYCL implementations.
[1]
Bernd Mohr,et al.
Performance Portability in Extreme Scale Computing (Dagstuhl Seminar 17431)
,
2017,
Dagstuhl Reports.
[2]
Olga Pearce,et al.
RAJA: Portable Performance for Large-Scale Scientific Applications
,
2019,
2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC).
[3]
Daniel Sunderland,et al.
Kokkos: Enabling manycore performance portability through polymorphic memory access patterns
,
2014,
J. Parallel Distributed Comput..
[4]
Victor W. Lee,et al.
Implications of a metric for performance portability
,
2017,
Future Gener. Comput. Syst..
[5]
Thomas Fahringer,et al.
SYCL-Bench: A Versatile Cross-Platform Benchmark Suite for Heterogeneous Computing
,
2020,
Euro-Par.
[6]
Vincent Heuveline,et al.
SYCL beyond OpenCL: The architecture, current state and future direction of hipSYCL
,
2020,
IWOCL.
[7]
David Kirk,et al.
NVIDIA cuda software and gpu parallel computing architecture
,
2007,
ISMM '07.