Statistical pattern based modeling of GPU memory access streams
暂无分享,去创建一个
Reena Panda | Andreas Gerstlauer | Lizy Kurian John | Jiajun Wang | Xinnian Zheng | A. Gerstlauer | Jiajun Wang | L. John | Reena Panda | Xinnian Zheng
[1] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[2] Richard W. Vuduc,et al. Many-Thread Aware Prefetching Mechanisms for GPGPU Applications , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[3] Reena Panda,et al. Prefetching Techniques for Near-memory Throughput Processors , 2016, ICS.
[4] David A. Wood,et al. gem5-gpu: A Heterogeneous CPU-GPU Simulator , 2015, IEEE Computer Architecture Letters.
[5] Lieven Eeckhout,et al. Performance Cloning: A Technique for Disseminating Proprietary Applications as Benchmarks , 2006, 2006 IEEE International Symposium on Workload Characterization.
[6] Carole-Jean Wu,et al. Characterizing the latency hiding ability of GPUs , 2014, 2014 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[7] B. Jacob,et al. CMP $ im : A Pin-Based OnThe-Fly Multi-Core Cache Simulator , 2008 .
[8] Reena Panda,et al. Accurate address streams for LLC and beyond (SLAB): A methodology to enable system exploration , 2017, 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[9] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[10] Irving L. Traiger,et al. Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..
[11] Lizy Kurian John,et al. Synthesizing memory-level parallelism aware miniature clones for SPEC CPU2006 and ImplantBench workloads , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[12] Onur Mutlu,et al. Ramulator: A Fast and Extensible DRAM Simulator , 2016, IEEE Computer Architecture Letters.
[13] Tao Tang,et al. Cache Miss Analysis for GPU Programs Based on Stack Distance Profile , 2011, 2011 31st International Conference on Distributed Computing Systems.
[14] Hai Jin,et al. GPGPU-MiniBench: Accelerating GPGPU Micro-Architecture Simulation , 2015, IEEE Transactions on Computers.
[15] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[16] Alper Sen,et al. MINIME-GPU , 2016, ACM Trans. Archit. Code Optim..
[17] Henk Corporaal,et al. A detailed GPU cache model based on reuse distance theory , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[18] Yan Solihin,et al. STM: Cloning the spatial and temporal memory access behavior , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[19] Richard W. Vuduc,et al. A performance analysis framework for identifying potential benefits in GPGPU applications , 2012, PPoPP '12.