VLAG: A very fast locality approximation model for GPU kernels with regular access patterns
暂无分享,去创建一个
[1] Kyu Yeun Kim,et al. Quantifying the performance and energy efficiency of advanced cache indexing for GPGPU computing , 2016, Microprocess. Microsystems.
[2] Tao Tang,et al. Cache Miss Analysis for GPU Programs Based on Stack Distance Profile , 2011, 2011 31st International Conference on Distributed Computing Systems.
[3] David R. Kaeli,et al. Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures , 2011, IEEE Transactions on Parallel and Distributed Systems.
[4] Henk Corporaal,et al. A detailed GPU cache model based on reuse distance theory , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[5] Sudhakar Yalamanchili,et al. Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[6] Zhen Lin,et al. Automatic data placement into GPU on-chip memory resources , 2015, 2015 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[7] Michael Goesele,et al. Guided profiling for auto-tuning array layouts on GPUs , 2015, PMBS '15.
[8] Sangpil Lee,et al. Parallel GPU Architecture Simulation Framework Exploiting Architectural-Level Parallelism with Timing Error Prediction , 2016, IEEE Transactions on Computers.
[9] Jungwon Kim,et al. A Performance Model for GPUs with Caches , 2015, IEEE Transactions on Parallel and Distributed Systems.
[10] William Gropp,et al. An adaptive performance modeling tool for GPU architectures , 2010, PPoPP '10.
[11] Richard W. Vuduc,et al. A performance analysis framework for identifying potential benefits in GPGPU applications , 2012, PPoPP '12.
[12] K. Srinathan,et al. A performance prediction model for the CUDA GPGPU platform , 2009, 2009 International Conference on High Performance Computing (HiPC).
[13] Sudhakar Yalamanchili,et al. Modeling GPU-CPU workloads and systems , 2010, GPGPU-3.
[14] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[15] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[16] Wen-mei W. Hwu,et al. Analytical Performance Prediction for Evaluation and Tuning of GPGPU Applications , 2009 .
[17] Wen-mei W. Hwu,et al. What is ahead for parallel computing , 2014, J. Parallel Distributed Comput..
[18] Hsien-Hsin S. Lee,et al. GPUMech: GPU Performance Modeling Technique Based on Interval Analysis , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[19] Jianliang Xu,et al. GPURoofline: A Model for Guiding Performance Optimizations on GPUs , 2012, Euro-Par.