Is Data Placement Optimization Still Relevant on Newer GPUs?
暂无分享,去创建一个
Barbara M. Chapman | Chunhua Liao | Pei-Hung Lin | Larisa Stoltzfus | Murali Emani | Md Abdullah Shahneous Bari | C. Liao | B. Chapman | M. Emani | Pei-Hung Lin | Larisa Stoltzfus
[1] Xinxin Mei,et al. Dissecting GPU Memory Hierarchy Through Microbenchmarking , 2015, IEEE Transactions on Parallel and Distributed Systems.
[2] W. Marsden. I and J , 2012 .
[3] Ian Karlin,et al. LULESH Programming Model and Performance Ports Overview , 2012 .
[4] Jie Cheng,et al. CUDA by Example: An Introduction to General-Purpose GPU Programming , 2010, Scalable Comput. Pract. Exp..
[5] David R. Kaeli,et al. Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures , 2011, IEEE Transactions on Parallel and Distributed Systems.
[6] B. Mohr,et al. GPU-STREAM v2.0: Benchmarking the achievable memory bandwidth of many-core processors across diverse parallel programming models , 2016 .
[7] Yuangang Wang,et al. Benchmarking the GPU memory at the warp level , 2018, Parallel Comput..
[8] Dong Li,et al. Optimizing Data Placement on GPU Memory: A Portable Approach , 2017, IEEE Transactions on Computers.
[9] James Demmel,et al. Benchmarking GPUs to tune dense linear algebra , 2008, HiPC 2008.
[10] nVIDIA社. CUDA Programming Guide 1.1 , 2007 .
[11] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[12] Yannis Cotronis,et al. A Quantitative Performance Evaluation of Fast on-Chip Memories of GPUs , 2016, 2016 24th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP).
[13] Dong Li,et al. Performance Modeling for Optimal Data Placement on GPU with Heterogeneous Memory Systems , 2017, 2017 IEEE International Conference on Cluster Computing (CLUSTER).
[14] Collin McCurdy,et al. The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.