Selection of Task Implementations in the Nanos++ Runtime
暂无分享,去创建一个
Eduard Ayguadé | Jesús Labarta | Rosa M. Badia | Judit Planas | E. Ayguadé | Jesús Labarta | R. Badia | Judit Planas
[1] Rudolf Eigenmann,et al. OpenMPC: Extended OpenMP Programming and Tuning for GPUs , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[2] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[3] Alejandro Duran,et al. A Proposal to Extend the OpenMP Tasking Model for Heterogeneous Architectures , 2009, IWOMP.
[4] Daniel A. Brokenshire,et al. Introduction to the Cell Broadband Engine Architecture , 2007, IBM J. Res. Dev..
[5] Jack Dongarra,et al. An Improved MAGMA GEMM for Fermi GPUs , 2010 .
[6] R. Dolbeau,et al. HMPP TM : A Hybrid Multi-core Parallel Programming Environment , 2022 .
[7] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[8] Anand Raghunathan,et al. MDR: performance model driven runtime for heterogeneous parallel platforms , 2011, ICS '11.
[9] Andrew Richards,et al. Offload - Automating Code Migration to Heterogeneous Multicore Systems , 2010, HiPEAC.
[10] Tarek S. Abdelrahman,et al. hiCUDA: High-Level GPGPU Programming , 2011, IEEE Transactions on Parallel and Distributed Systems.
[11] Jesús Labarta,et al. A dependency-aware task-based programming environment for multi-core architectures , 2008, 2008 IEEE International Conference on Cluster Computing.
[12] Cédric Augonnet,et al. A Unified Runtime System for Heterogeneous Multi-core Architectures , 2009, Euro-Par Workshops.
[13] Alejandro Duran,et al. Productive Programming of GPU Clusters with OmpSs , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[14] Scott B. Baden,et al. Mint: realizing CUDA performance in 3D stencil methods with annotated C , 2011, ICS '11.
[15] Alejandro Duran,et al. Extending the OpenMP Tasking Model to Allow Dependent Tasks , 2008, IWOMP.
[16] Alejandro Duran,et al. Extending OpenMP to Survive the Heterogeneous Multi-Core Era , 2010, International Journal of Parallel Programming.
[17] Wen-mei W. Hwu,et al. CUDA-Lite: Reducing GPU Programming Complexity , 2008, LCPC.
[18] William J. Dally,et al. Compilation for explicitly managed memory hierarchies , 2007, PPOPP.