Runtime Support for Adaptive Spatial Partitioning and Inter-Kernel Communication on GPUs
暂无分享,去创建一个
David R. Kaeli | Dana Schaa | Perhaad Mistry | Yash Ukidave | Charu Kalra | D. Kaeli | Dana Schaa | Charu Kalra | Yash Ukidave | Perhaad Mistry
[1] Giulio Giunta,et al. A GPGPU Transparent Virtualization Component for High Performance Computing Clouds , 2010, Euro-Par.
[2] David R. Kaeli,et al. Multi2Sim: A simulation framework for CPU-GPU computing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[3] Kenneth Moreland,et al. The FFT on a GPU , 2003, HWWS '03.
[4] Margaret Martonosi,et al. Reducing GPU offload latency via fine-grained CPU-GPU synchronization , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[5] Robert J. Harrison,et al. Adapting Irregular Computations to Large CPU-GPU Clusters in the MADNESS Framework , 2012, 2012 IEEE International Conference on Cluster Computing.
[6] Hiroshi Matsuo,et al. RaVioli: a GPU Supported High-Level Pseudo Real-time Video Processing Library , 2011 .
[7] Kevin Skadron,et al. Load balancing in a changing world: dealing with heterogeneity and performance variability , 2013, CF '13.
[8] Kevin Skadron,et al. A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).
[9] Robert Ricci,et al. Augmenting Operating Systems With the GPU , 2013, ArXiv.
[10] Mateo Valero,et al. Enabling preemptive multiprogramming on GPUs , 2014, 2014 ACM/IEEE 41st International Symposium on Computer Architecture (ISCA).
[11] Assaf Schuster,et al. Processing data streams with hard real-time constraints on heterogeneous systems , 2011, ICS '11.
[12] David A. Wood,et al. QuickRelease: A throughput-oriented approach to release consistency on GPUs , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[13] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[14] Kevin Skadron,et al. Fine-grained resource sharing for concurrent GPGPU kernels , 2012, HotPar'12.
[15] R. Govindarajan,et al. Improving GPGPU concurrency with elastic kernels , 2013, ASPLOS '13.
[16] John Kubiatowicz,et al. GPUs as an opportunity for offloading garbage collection , 2012, ISMM '12.
[17] David R. Kaeli,et al. Valar: a benchmark suite to study the dynamic behavior of heterogeneous systems , 2013, GPGPU@ASPLOS.
[18] Klaus H. Hinrichs,et al. Texturing techniques for terrain visualization , 2000, IEEE Visualization.
[19] David A. Bader. Designing Scalable Synthetic Compact Applications for Benchmarking High Productivity Computing Systems , 2006 .
[20] David R. Kaeli,et al. Analyzing program flow within a many-kernel OpenCL application , 2011, GPGPU-4.