Weighted dynamic scheduling with many parallelism grains for offloading of numerical workloads to multiple varied accelerators
暂无分享,去创建一个
Jack J. Dongarra | Stanimire Tomov | Piotr Luszczek | Azzam Haidar | Asim YarKhan | Yulu Jia | J. Dongarra | P. Luszczek | A. Haidar | S. Tomov | A. YarKhan | Yulu Jia
[1] Michael T. Goodrich,et al. A bridging model for parallel computation, communication, and I/O , 1996, CSUR.
[2] Jack J. Dongarra,et al. High performance matrix inversion based on LU factorization for multicore architectures , 2011, MTAGS '11.
[3] Monica S. Lam,et al. Jade: a high-level, machine-independent language for parallel programming , 1993, Computer.
[4] Jack J. Dongarra,et al. Unified Development for Mixed Multi-GPU and Multi-coprocessor Environments Using a Lightweight Runtime Environment , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[5] Asim YarKhan,et al. Dynamic Task Execution on Shared and Distributed Memory Architectures , 2012 .
[6] Jesús Labarta,et al. A dependency-aware task-based programming environment for multi-core architectures , 2008, 2008 IEEE International Conference on Cluster Computing.
[7] Julien Langou,et al. The Impact of Multicore on Math Software , 2006, PARA.
[8] Jack J. Dongarra,et al. Achieving numerical accuracy and high performance using recursive tile LU factorization with partial pivoting , 2014, Concurr. Comput. Pract. Exp..
[9] Jack J. Dongarra,et al. Exploiting Fine-Grain Parallelism in Recursive LU Factorization , 2011, PARCO.
[10] Leslie G. Valiant,et al. A bridging model for parallel computation , 1990, CACM.
[11] Basilio B. Fraguela,et al. A framework for argument-based task synchronization with automatic detection of dependencies , 2013, Parallel Comput..
[12] Yi Guo,et al. The habanero multicore software research project , 2009, OOPSLA Companion.
[13] Jack Dongarra,et al. QUARK Users' Guide: QUeueing And Runtime for Kernels , 2011 .
[14] L. Dagum,et al. OpenMP: an industry standard API for shared-memory programming , 1998 .
[15] Robert A. van de Geijn,et al. Supermatrix out-of-order scheduling of matrix operations for SMP and multi-core architectures , 2007, SPAA '07.
[16] Jack J. Dongarra,et al. Analysis of dynamically scheduled tile algorithms for dense linear algebra on multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[17] P. Hanrahan,et al. Sequoia: Programming the Memory Hierarchy , 2006, ACM/IEEE SC 2006 Conference (SC'06).
[18] Jack J. Dongarra,et al. An Implementation of the Tile QR Factorization for a GPU and Multiple CPUs , 2010, PARA.
[19] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[20] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..