Efficient Data Communication between CPU and GPU through Transparent Partial-Page Migration
暂无分享,去创建一个
Li Shen | Shiqing Zhang | Zhiying Wang | YaoHua Yang | Zhiying Wang | Li Shen | Shiqing Zhang | YaoHua Yang
[1] Ján Veselý,et al. Observations and opportunities in architecting shared virtual memory for heterogeneous systems , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[2] Stephen W. Keckler,et al. Page Placement Strategies for GPUs within Heterogeneous Memory Systems , 2015, ASPLOS.
[3] David A. Wood,et al. Supporting x86-64 address translation for 100s of GPU lanes , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[4] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[5] Stéphan Jourdan,et al. Haswell: The Fourth-Generation Intel Core Processor , 2014, IEEE Micro.
[6] David Patterson,et al. The Top 10 Innovations in the New NVIDIA Fermi Architecture, and the Top 3 Next Challenges , 2009 .
[7] David W. Nellans,et al. Handling the problems and opportunities posed by multiple on-chip memory controllers , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[8] David Kirk,et al. NVIDIA cuda software and gpu parallel computing architecture , 2007, ISMM '07.
[9] Paolo Prinetto,et al. Fault mitigation strategies for CUDA GPUs , 2013, 2013 IEEE International Test Conference (ITC).
[10] Erik Lindholm,et al. NVIDIA Tesla: A Unified Graphics and Computing Architecture , 2008, IEEE Micro.
[11] Margaret Martonosi,et al. Reducing GPU offload latency via fine-grained CPU-GPU synchronization , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[12] Phil Rogers,et al. Heterogeneous system architecture overview , 2013, 2013 IEEE Hot Chips 25 Symposium (HCS).
[13] Jasmin Ajanovic. PCI express 3.0 overview , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).
[14] William J. Dally,et al. GPUs and the Future of Parallel Computing , 2011, IEEE Micro.
[15] David W. Nellans,et al. Towards high performance paged memory for GPUs , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[16] Jayshree Ghorpade,et al. GPGPU Processing in CUDA Architecture , 2012, ArXiv.
[17] Thomas F. Wenisch,et al. Unlocking bandwidth for GPUs in CC-NUMA systems , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[18] Zhao Zhang,et al. Flexible memory: A novel main memory architecture with block-level memory compression , 2015, 2015 IEEE International Conference on Networking, Architecture and Storage (NAS).
[19] Jaewon Lee,et al. ScaleGPU: GPU Architecture for Memory-Unaware GPU Programming , 2014, IEEE Computer Architecture Letters.
[20] Raphael Landaverde,et al. An investigation of Unified Memory Access performance in CUDA , 2014, 2014 IEEE High Performance Extreme Computing Conference (HPEC).