SemCache++: Semantics-Aware Caching for Efficient Multi-GPU Offloading
暂无分享,去创建一个
[1] Yves Robert,et al. A Proposal for a Heterogeneous Cluster ScaLAPACK (Dense Linear Solvers) , 2001, IEEE Trans. Computers.
[2] Eric J. Kelmelis,et al. CULA: hybrid GPU accelerated linear algebra routines , 2010, Defense + Commercial Sensing.
[3] Jack J. Dongarra,et al. Enabling and scaling matrix computations on heterogeneous multi-core and multi-GPU systems , 2012, ICS '12.
[4] Matemática,et al. Society for Industrial and Applied Mathematics , 2010 .
[5] Jungwon Kim,et al. Achieving a single compute device image in OpenCL for multiple GPUs , 2011, PPoPP '11.
[6] Martin Uecker,et al. A Multi-GPU Programming Library for Real-Time Applications , 2012, ICA3PP.
[7] David I. August,et al. Automatic CPU-GPU communication management and optimization , 2011, PLDI '11.
[8] Eduard Ayguadé,et al. An Extension of the StarSs Programming Model for Platforms with Multiple GPUs , 2009, Euro-Par.
[9] Bronis R. de Supinski,et al. OpenMP for Accelerators , 2011, IWOMP.
[10] Robert A. van de Geijn,et al. Solving dense linear systems on platforms with multiple hardware accelerators , 2009, PPoPP '09.
[11] John E. Stone,et al. An asymmetric distributed shared memory model for heterogeneous parallel systems , 2010, ASPLOS XV.
[12] Feng Liu,et al. Dynamically managed data for CPU-GPU architectures , 2012, CGO '12.
[13] Ed Anderson,et al. LAPACK Users' Guide , 1995 .
[14] Tamara G. Kolda,et al. An overview of the Trilinos project , 2005, TOMS.
[15] Milind Kulkarni,et al. SemCache: semantics-aware caching for efficient GPU offloading , 2016, ICS '13.
[16] Mark Silberstein,et al. PTask: operating system abstractions to manage GPUs as compute devices , 2011, SOSP.
[17] Mickeal Verschoor,et al. Analysis and performance estimation of the Conjugate Gradient method on multiple GPUs , 2012, Parallel Comput..
[18] Cédric Augonnet,et al. StarPU: a unified platform for task scheduling on heterogeneous multicore architectures , 2011, Concurr. Comput. Pract. Exp..
[19] R. Govindarajan,et al. Fast and efficient automatic memory management for GPUs using compiler-assisted runtime coherence scheme , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).