GPUs Cache Performance Estimation using Reuse Distance Analysis
暂无分享,去创建一个
Gopinath Chennupati | Stephan Eidenbenz | Abdel-Hameed A. Badawy | Yehia Arafa | Nandakishore Santhi | Atanu Barai
[1] Satyajayant Misra,et al. A Scalable Analytical Memory Model for CPU Performance Prediction , 2017, PMBS@SC.
[2] Henk Corporaal,et al. A detailed GPU cache model based on reuse distance theory , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[3] Steve Carr,et al. Reuse-distance-based miss-rate prediction on a per instruction basis , 2004, MSP '04.
[4] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[5] David A. Padua,et al. Estimating cache misses and locality using stack distances , 2003, ICS '03.
[6] Wen-mei W. Hwu,et al. Efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors , 2012, PPoPP '12.
[7] Mohamed Zahran,et al. Efficient utilization of GPGPU cache hierarchy , 2015, GPGPU@PPoPP.
[8] Donald Yeung,et al. Guiding Locality Optimizations for Graph Computations via Reuse Distance Analysis , 2017, IEEE Computer Architecture Letters.
[9] Shuaiwen Song,et al. Locality-Driven Dynamic GPU Cache Bypassing , 2015, ICS.
[10] Yutao Zhong,et al. Predicting whole-program locality through reuse distance analysis , 2003, PLDI.
[11] Amir Rajabzadeh,et al. Efficient Cache Performance Modeling in GPUs Using Reuse Distance Analysis , 2018, ACM Trans. Archit. Code Optim..
[12] Tao Tang,et al. Cache Miss Analysis for GPU Programs Based on Stack Distance Profile , 2011, 2011 31st International Conference on Distributed Computing Systems.
[13] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[14] Dongwei Wang,et al. A reuse distance based performance analysis on GPU L1 data cache , 2016, 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC).
[15] Mahmut T. Kandemir,et al. Studying inter-core data reuse in multicores , 2011, SIGMETRICS '11.
[16] Gopinath Chennupati,et al. PPT-GPU: Scalable GPU Performance Modeling , 2019, IEEE Computer Architecture Letters.
[17] Lifan Xu,et al. Auto-tuning a high-level language targeted to GPU codes , 2012, 2012 Innovative Parallel Computing (InPar).
[18] Gopinath Chennupati,et al. An analytical memory hierarchy model for performance prediction , 2017, 2017 Winter Simulation Conference (WSC).
[19] Sudhakar Yalamanchili,et al. Ocelot: A dynamic optimization framework for bulk-synchronous applications in heterogeneous systems , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[20] Gopinath Chennupati,et al. Low Overhead Instruction Latency Characterization for NVIDIA GPGPUs , 2019, 2019 IEEE High Performance Extreme Computing Conference (HPEC).
[21] Marco Maggioni,et al. Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking , 2018, ArXiv.
[22] Bin Wang. Mitigating GPU Memory Divergence for Data-Intensive Applications , 2015 .
[23] Kristof Beyls,et al. Reuse Distance as a Metric for Cache Behavior. , 2001 .
[24] Gopinath Chennupati,et al. Scalable Performance Prediction of Codes with Memory Hierarchy and Pipelines , 2019, SIGSIM-PADS.
[25] Chen Ding,et al. Program locality analysis using reuse distance , 2009, TOPL.
[26] John E. Stone,et al. OpenCL: A Parallel Programming Standard for Heterogeneous Computing Systems , 2010, Computing in Science & Engineering.
[27] Donald Yeung,et al. Optimizing locality in graph computations using reuse distance profiles , 2017, 2017 IEEE 36th International Performance Computing and Communications Conference (IPCCC).
[28] Arun Parakh,et al. Performance Estimation of GPUs with Cache , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium Workshops & PhD Forum.
[29] Yun Liang,et al. An efficient compiler framework for cache bypassing on GPUs , 2013, ICCAD 2013.
[30] David W. Nellans,et al. Flexible software profiling of GPU architectures , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[31] Derek L. Schuff,et al. Multicore-aware reuse distance analysis , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[32] Erik Hagersten,et al. StatCache: a probabilistic approach to efficient and accurate data locality analysis , 2004, IEEE International Symposium on - ISPASS Performance Analysis of Systems and Software, 2004.