HOMP: Automated Distribution of Parallel Loops and Data in Highly Parallel Accelerator-Based Systems
暂无分享,去创建一个
Kirk W. Cameron | Yonghong Yan | Jiawen Liu | Mariam Umar | K. Cameron | M. Umar | Yonghong Yan | Jiawen Liu
[1] Michael F. P. O'Boyle,et al. Portable mapping of data parallel programs to OpenCL for heterogeneous systems , 2013, Proceedings of the 2013 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[2] Bronis R. de Supinski,et al. Early Experiences with the OpenMP Accelerator Model , 2013, IWOMP.
[3] David J. Lilja,et al. Parallel Loop Scheduling for High Performance Computers , 1995 .
[4] Thomas B. Jablin,et al. Automatic execution of single-GPU computations across multiple GPUs , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[5] Gagan Agrawal,et al. Compiler and runtime support for enabling generalized reduction computations on heterogeneous parallel configurations , 2010, ICS '10.
[6] Keshav Pingali,et al. Adaptive heterogeneous scheduling for integrated GPUs , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[7] Thierry Gautier,et al. Exploiting Concurrent GPU Operations for Efficient Work Stealing on Multi-GPUs , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.
[8] Roger W. Hockney,et al. The Communication Challenge for MPP: Intel Paragon and Meiko CS-2 , 1994, Parallel Computing.
[9] Yi Yang,et al. Semi-automatic restructuring of offloadable tasks for many-core accelerators , 2013, 2013 SC - International Conference for High Performance Computing, Networking, Storage and Analysis (SC).
[10] Bronis R. de Supinski,et al. CoreTSAR: Adaptive Worksharing for Heterogeneous Systems , 2014, ISC.
[11] Steven J. Deitz,et al. User-defined distributions and layouts in chapel: philosophy and framework , 2010 .
[12] Guy L. Steele,et al. The High Performance Fortran Handbook , 1993 .
[13] Bronis R. de Supinski,et al. Supporting multiple accelerators in high-level programming models , 2015, PMAM '15.
[14] Saman P. Amarasinghe,et al. Portable performance on heterogeneous architectures , 2013, ASPLOS '13.
[15] Vivek Sarkar,et al. X10: an object-oriented approach to non-uniform cluster computing , 2005, OOPSLA '05.
[16] Uday Bondhugula,et al. Automatic data allocation and buffer management for multi-GPU machines , 2013, TACO.
[17] Michael F. P. O'Boyle,et al. A Static Task Partitioning Approach for Heterogeneous Systems Using OpenCL , 2011, CC.
[18] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[19] Christopher D. Carothers,et al. Heterogeneous concurrent execution of Monte Carlo photon transport on CPU, GPU and MIC , 2014, IA3 '14.
[20] Gregory Diamos,et al. Harmony: an execution model and runtime for heterogeneous many core systems , 2008, HPDC '08.
[21] Richard W. Vuduc,et al. Tuned and wildly asynchronous stencil kernels for hybrid CPU/GPU systems , 2009, ICS.
[22] Katherine Yelick,et al. UPC: Distributed Shared-Memory Programming , 2003 .
[23] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[24] Gagan Agrawal,et al. A dynamic scheduling framework for emerging heterogeneous systems , 2011, 2011 18th International Conference on High Performance Computing.
[25] William Gropp,et al. An adaptive performance modeling tool for GPU architectures , 2010, PPoPP '10.
[26] Jack J. Dongarra,et al. Multi-GPU Implementation of LU Factorization , 2012, ICCS.
[27] Scott A. Mahlke,et al. SKMD: Single Kernel on Multiple Devices for Transparent CPU-GPU Collaboration , 2015, ACM Trans. Comput. Syst..
[28] Samuel Williams,et al. Roofline: an insightful visual performance model for multicore architectures , 2009, CACM.
[29] Hyesoon Kim,et al. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[30] Sunita Chandrasekaran,et al. Exploring Programming Multi-GPUs Using OpenMP and OpenACC-Based Hybrid Model , 2013, 2013 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum.