Design and analysis of scheduling strategies for multi-CPU and multi-GPU architectures
暂无分享,去创建一个
Bruno Raffin | Vincent Danjean | Thierry Gautier | Nicolas Maillard | João V. F. Lima | T. Gautier | Vincent Danjean | B. Raffin | N. Maillard | J. F. Lima
[1] Jack J. Dongarra,et al. A scalable framework for heterogeneous GPU-based clusters , 2012, SPAA '12.
[2] Robert A. van de Geijn,et al. Solving dense linear systems on platforms with multiple hardware accelerators , 2009, PPoPP '09.
[3] Thierry Gautier,et al. Exploiting Concurrent GPU Operations for Efficient Work Stealing on Multi-GPUs , 2012, 2012 IEEE 24th International Symposium on Computer Architecture and High Performance Computing.
[4] Charles E. Leiserson,et al. Space-Efficient Scheduling of Multithreaded Computations , 1998, SIAM J. Comput..
[5] Julien Langou,et al. A Class of Parallel Tiled Linear Algebra Algorithms for Multicore Architectures , 2007, Parallel Comput..
[6] Alejandro Duran,et al. Productive Programming of GPU Clusters with OmpSs , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[7] Jack J. Dongarra,et al. Dense Linear Algebra on Accelerated Multicore Hardware , 2012, High-Performance Scientific Computing.
[8] Salim Hariri,et al. Performance-Effective and Low-Complexity Task Scheduling for Heterogeneous Computing , 2002, IEEE Trans. Parallel Distributed Syst..
[9] Guy E. Blelloch,et al. The data locality of work stealing , 2000, SPAA.
[10] Bruno Raffin,et al. XKaapi: A Runtime System for Data-Flow Task Programming on Heterogeneous Architectures , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[11] Thierry Gautier,et al. KAAPI: A thread scheduling runtime system for data flow computations on cluster of multi-processors , 2007, PASCO '07.
[12] Thierry Gautier,et al. libKOMP, an Efficient OpenMP Runtime System for Both Fork-Join and Data Flow Paradigms , 2012, IWOMP.
[13] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[14] Eduard Ayguadé,et al. Implementing OmpSs support for regions of data in architectures with multiple address spaces , 2013, ICS '13.
[15] Gerson G. H. Cavalheiro,et al. Athapascan-1: On-line building data flow graph in a parallel language , 1998, Proceedings. 1998 International Conference on Parallel Architectures and Compilation Techniques (Cat. No.98EX192).
[16] Eduard Ayguadé,et al. Self-Adaptive OmpSs Tasks in Heterogeneous Environments , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[17] Laxmikant V. Kalé,et al. G-Charm: an adaptive runtime system for message-driven parallel applications on hybrid systems , 2013, ICS '13.
[18] Laxmikant V. Kalé,et al. Scaling Hierarchical N-body Simulations on GPU Clusters , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[19] Cédric Augonnet,et al. Data-Aware Task Scheduling on Multi-accelerator Based Platforms , 2010, 2010 IEEE 16th International Conference on Parallel and Distributed Systems.
[20] Yi Guo,et al. SLAW: A scalable locality-aware adaptive work-stealing scheduler , 2010, IPDPS.
[21] Matteo Frigo,et al. The implementation of the Cilk-5 multithreaded language , 1998, PLDI.
[22] Jesús Labarta,et al. Parallelizing dense and banded linear algebra libraries using SMPSs , 2009, Concurr. Comput. Pract. Exp..
[23] Jack Dongarra,et al. QUARK Users' Guide: QUeueing And Runtime for Kernels , 2011 .
[24] Thomas Hérault,et al. DAGuE: A Generic Distributed DAG Engine for High Performance Computing , 2011, 2011 IEEE International Symposium on Parallel and Distributed Processing Workshops and Phd Forum.
[25] Eduard Ayguadé,et al. An Extension of the StarSs Programming Model for Platforms with Multiple GPUs , 2009, Euro-Par.
[26] Thierry Gautier,et al. X-Kaapi C programming interface , 2011 .
[27] Thierry Gautier,et al. A New Programming Paradigm for GPGPU , 2012, Euro-Par.
[28] Jack Dongarra,et al. A Class of Hybrid LAPACK Algorithms for Multicore and GPU Architectures , 2011, 2011 Symposium on Application Accelerators in High-Performance Computing.
[29] Jérémie Allard,et al. Multi-GPU and Multi-CPU Parallelization for Interactive Physics Simulations , 2010, Euro-Par.
[30] Jack J. Dongarra,et al. Towards dense linear algebra for hybrid GPU accelerated manycore systems , 2009, Parallel Comput..