Efficient parallel reduction on GPUs with Hipacc
暂无分享,去创建一个
Jürgen Teich | Frank Hannig | Oliver Reiche | Bo Qiao | M. Akif Özkan | J. Teich | Bo Qiao | Frank Hannig | M. A. Özkan | Oliver Reiche
[1] Jürgen Teich,et al. Automatic Kernel Fusion for Image Processing DSLs , 2018, SCOPES.
[2] Jinjun Xiong,et al. Accelerating reduction and scan using tensor core units , 2018, ICS.
[3] Jürgen Teich,et al. HIPAcc: A Domain-Specific Language and Compiler for Image Processing , 2016, IEEE Transactions on Parallel and Distributed Systems.
[4] Isaac N. Bankman,et al. Handbook of medical image processing and analysis , 2009 .
[5] Jürgen Teich,et al. From Loop Fusion to Kernel Fusion: A Domain-Specific Approach to Locality Optimization , 2019, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[6] Nathan Bell,et al. Thrust: A Productivity-Oriented Library for CUDA , 2012 .
[7] Roberto Torres,et al. Algorithmic strategies for optimizing the parallel reduction primitive in CUDA , 2012, 2012 International Conference on High Performance Computing & Simulation (HPCS).
[8] Shubhabrata Sengupta,et al. Efficient Parallel Scan Algorithms for GPUs , 2011 .
[9] Jürgen Teich,et al. Unveiling kernel concurrency in multiresolution filters on GPUs with an image processing DSL , 2020, GPGPU@PPoPP.
[10] Uday Bondhugula,et al. PolyMage: Automatic Optimization for Image Processing Pipelines , 2015, ASPLOS.
[11] Frédo Durand,et al. Halide: a language and compiler for optimizing parallelism, locality, and recomputation in image processing pipelines , 2013, PLDI 2013.
[12] Simon D. Hammond,et al. Automatic Generation of Warp-Level Primitives and Atomic Instructions for Fast and Portable Parallel Reduction on GPUs , 2019, 2019 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).