Evaluating the Performance of Integer Sum Reduction on an Intel GPU
暂无分享,去创建一个
[1] Hal Finkel,et al. Nuclear Reactor Simulation on OpenCL FPGA: a Case Study of RSBench , 2018, IWOCL.
[2] Hal Finkel,et al. A Case Study of k-means Clustering using SYCL , 2019, 2019 IEEE International Conference on Big Data (Big Data).
[3] Eddy Z. Zhang,et al. Massive atomics for massive parallelism on GPUs , 2014, ISMM '14.
[4] Christian Robert Trott,et al. Performance Portability of a Wilson Dslash Stencil Operator Mini-App Using Kokkos and SYCL , 2019, 2019 IEEE/ACM International Workshop on Performance, Portability and Productivity in HPC (P3HPC).
[5] Thomas Steinke,et al. Porting a Legacy CUDA Stencil Code to oneAPI , 2020, 2020 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW).
[6] Brian Homerding,et al. Evaluating the Performance of the hipSYCL Toolchain for HPC Kernels on NVIDIA V100 GPUs , 2020, IWOCL.
[7] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.
[8] Simon McIntosh-Smith,et al. Evaluating the performance of HPC-style SYCL applications , 2020, IWOCL.
[9] George A. Constantinides,et al. A Case for Work-stealing on FPGAs with OpenCL Atomics , 2016, FPGA.
[10] Wu-chun Feng,et al. Performance Characterization and Optimization of Atomic Operations on AMD GPUs , 2011, 2011 IEEE International Conference on Cluster Computing.
[11] Ben Ashbaugh. Debugging and Analyzing Programs Using the Intercept Layer for OpenCL Applications , 2018, IWOCL.
[12] Roberto Torres,et al. Algorithmic strategies for optimizing the parallel reduction primitive in CUDA , 2012, 2012 International Conference on High Performance Computing & Simulation (HPCS).
[13] John Freeman,et al. From opencl to high-performance hardware on FPGAS , 2012, 22nd International Conference on Field Programmable Logic and Applications (FPL).
[14] Hal Finkel,et al. Evaluation of Medical Imaging Applications using SYCL , 2019, 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).
[15] Yinan Ke,et al. neoSYCL: a SYCL implementation for SX-Aurora TSUBASA , 2021, HPC Asia.
[16] Rafael Asenjo,et al. Efficiency and productivity for decision making on low-power heterogeneous CPU+GPU SoCs , 2020, The Journal of Supercomputing.