Performance evaluation of OpenMP's target construct on GPUs - exploring compiler optimisations
暂无分享,去创建一个
[1] Vikram S. Adve,et al. LLVM: a compilation framework for lifelong program analysis & transformation , 2004, International Symposium on Code Generation and Optimization, 2004. CGO 2004..
[2] David F. Bacon,et al. Compiling a high-level language for GPUs: (via language support for architectures and compilers) , 2012, PLDI.
[3] Rudolf Eigenmann,et al. OpenMPC: Extended OpenMP Programming and Tuning for GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[4] Vivek Sarkar,et al. Machine-Learning-based Performance Heuristics for Runtime CPU/GPU Selection , 2015, PPPJ.
[5] Ondrej Lhoták,et al. Automatic parallelization for graphics processing units , 2009, PPPJ '09.
[6] J. Ramanujam,et al. Automatic C-to-CUDA Code Generation for Affine Programs , 2010, CC.
[7] Vivek Sarkar,et al. Exploring Compiler Optimization Opportunities for the OpenMP 4.× Accelerator Model on a POWER8+GPU Platform , 2016, 2016 Third Workshop on Accelerator Programming Using Directives (WACCPD).
[8] Vivek Sarkar,et al. Optimized two-level parallelization for GPU accelerators using the polyhedral model , 2017, CC.
[9] Kevin O'Brien,et al. Coordinating GPU Threads for OpenMP 4.0 in LLVM , 2014, 2014 LLVM Compiler Infrastructure in HPC.
[10] Vivek Sarkar,et al. Compiling and Optimizing Java 8 Programs for GPU Execution , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).
[11] Yong-Jun Lee,et al. Translating OpenMP Device Constructs to OpenCL Using Unnecessary Data Transfer Elimination , 2016, SC16: International Conference for High Performance Computing, Networking, Storage and Analysis.
[12] Nicolas Vasilache,et al. Joint Scheduling and Layout Optimization to Enable Multi-Level Vectorization , 2012 .
[13] Kevin O'Brien,et al. Integrating GPU support for OpenMP offloading directives into Clang , 2015, LLVM '15.
[14] Aaftab Munshi,et al. The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).
[15] Alistair P. Rendell,et al. Implementation and Optimization of the OpenMP Accelerator Model for the TI Keystone II Architecture , 2014, IWOMP.
[16] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.
[17] Vivek Sarkar,et al. Accelerating Habanero-Java programs with OpenCL generation , 2013, PPPJ.
[18] Justin P. Haldar,et al. Accelerating advanced MRI reconstructions on GPUs , 2008, J. Parallel Distributed Comput..
[19] Tian Jin,et al. Performance Analysis and Optimization of Clang's OpenMP 4.5 GPU Support , 2016, 2016 7th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS).
[20] Benoît Meister,et al. A mapping path for multi-GPGPU accelerated computers from a portable high level programming abstraction , 2010, GPGPU-3.
[21] Laurie J. Hendren,et al. Velociraptor: An embedded compiler toolkit for numerical programs targeting CPUs and GPUs , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[22] Tian Jin,et al. Offloading Support for OpenMP in Clang and LLVM , 2016, 2016 Third Workshop on the LLVM Compiler Infrastructure in HPC (LLVM-HPC).
[23] Kevin O'Brien,et al. Performance analysis of OpenMP on a GPU using a CORAL proxy application , 2015, PMBS '15.