Simplifying programming and load balancing of data parallel applications on heterogeneous systems
暂无分享,去创建一个
[1] Jungwon Kim,et al. Achieving a single compute device image in OpenCL for multiple GPUs , 2011, PPoPP '11.
[2] Scott A. Mahlke,et al. Transparent CPU-GPU collaboration for data-parallel kernels on heterogeneous systems , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[3] R. Govindarajan,et al. Fluidic Kernels: Cooperative Execution of OpenCL Programs on Multiple Heterogeneous Devices , 2014, CGO '14.
[4] Jungwon Kim,et al. SnuCL: an OpenCL framework for heterogeneous CPU/GPU clusters , 2012, ICS '12.
[5] Scott A. Mahlke,et al. SKMD: Single Kernel on Multiple Devices for Transparent CPU-GPU Collaboration , 2015, ACM Trans. Comput. Syst..
[6] Rafael Asenjo,et al. Strategies for maximizing utilization on multi-CPU and multi-GPU heterogeneous architectures , 2014, The Journal of Supercomputing.
[7] Jaejin Lee,et al. Automatic OpenCL work-group size selection for multicore CPUs , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[8] Jeffrey S. Vetter,et al. Maestro: Data Orchestration and Tuning for OpenCL Devices , 2010, Euro-Par.
[9] Jack J. Dongarra,et al. Unified Development for Mixed Multi-GPU and Multi-coprocessor Environments Using a Lightweight Runtime Environment , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[10] Ziming Zhong,et al. Data Partitioning on Multicore and Multi-GPU Platforms Using Functional Performance Models , 2015, IEEE Transactions on Computers.
[11] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[12] Keshav Pingali,et al. Adaptive heterogeneous scheduling for integrated GPUs , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[13] Jianlong Zhong,et al. Kernelet: High-Throughput GPU Kernel Executions with Dynamic Slicing and Scheduling , 2013, IEEE Transactions on Parallel and Distributed Systems.
[14] Hyesoon Kim,et al. Qilin: Exploiting parallelism on heterogeneous multiprocessors with adaptive mapping , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[15] Francisco Almeida,et al. Dynamic load balancing on heterogeneous multicore/multiGPU systems , 2010, 2010 International Conference on High Performance Computing & Simulation.
[16] Pablo Toharia,et al. Static Multi-device Load Balancing for OpenCL , 2012, 2012 IEEE 10th International Symposium on Parallel and Distributed Processing with Applications.
[17] Bronis R. de Supinski,et al. Heterogeneous Task Scheduling for Accelerated OpenMP , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[18] Bruno Raffin,et al. XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[19] Kevin Skadron,et al. Load balancing in a changing world: dealing with heterogeneity and performance variability , 2013, CF '13.