Exploiting Concurrent GPU Operations for Efficient Work Stealing on Multi-GPUs
暂无分享,去创建一个
Thierry Gautier | Vincent Danjean | Nicolas Maillard | João V. F. Lima | T. Gautier | Vincent Danjean | N. Maillard | J. F. Lima
[1] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[2] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..
[3] Robert A. van de Geijn,et al. Solving dense linear systems on platforms with multiple hardware accelerators , 2009, PPoPP '09.
[4] Gerson G. H. Cavalheiro,et al. Athapascan-1: On-line building data flow graph in a parallel language , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).
[5] Thierry Gautier,et al. KAAPI: A thread scheduling runtime system for data flow computations on cluster of multi-processors , 2007, PASCO '07.
[6] Jack Dongarra,et al. A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures , 2011, 2011 Symposium on Application Accelerators in High-Performance Computing.
[7] Jérémie Allard,et al. Multi-GPU and Multi-CPU Parallelization for Interactive Physics Simulations , 2010, Euro-Par.
[8] Alejandro Duran,et al. Productive Programming of GPU Clusters with OmpSs , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[9] Matteo Frigo,et al. The implementation of the Cilk-5 multithreaded language , 1998, PLDI.
[10] Thierry Gautier,et al. The X-Kaapi's Application Programming Interface. Part I: Data Flow Programming , 2011 .
[11] Emmanuel Agullo,et al. QR Factorization on a Multicore Node Enhanced with Multiple GPU Accelerators , 2011, 2011 IEEE International Parallel & Distributed Processing Symposium.
[12] Jack J. Dongarra,et al. Towards dense linear algebra for hybrid GPU accelerated manycore systems , 2009, Parallel Comput..
[13] Eduard Ayguadé,et al. An Extension of the StarSs Programming Model for Platforms with Multiple GPUs , 2009, Euro-Par.
[14] Jack Dongarra,et al. QUARK Users' Guide: QUeueing And Runtime for Kernels , 2011 .
[15] James Reinders,et al. Intel® threading building blocks , 2008 .
[16] Thierry Gautier,et al. libKOMP, an Efficient OpenMP Runtime System for Both Fork-Join and Data Flow Paradigms , 2012, IWOMP.