Efficient Cache Performance Modeling in GPUs Using Reuse Distance Analysis
暂无分享,去创建一个
[1] Hsien-Hsin S. Lee,et al. GPUMech: GPU Performance Modeling Technique Based on Interval Analysis , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[2] Mohamed Zahran,et al. SACAT: Streaming-Aware Conflict-Avoiding Thrashing-Resistant GPGPU Cache Management Scheme , 2017, IEEE Transactions on Parallel and Distributed Systems.
[3] Yang Yang,et al. A Highly Parallel Reuse Distance Analysis Algorithm on GPUs , 2012, 2012 IEEE 26th International Parallel and Distributed Processing Symposium.
[4] Chen Ding,et al. A Composable Model for Analyzing Locality of Multi-threaded Programs , 2009 .
[5] Xinxin Mei,et al. Dissecting GPU Memory Hierarchy Through Microbenchmarking , 2015, IEEE Transactions on Parallel and Distributed Systems.
[6] Richard W. Vuduc,et al. A performance analysis framework for identifying potential benefits in GPGPU applications , 2012, PPoPP '12.
[7] Donald Yeung,et al. Identifying optimal multicore cache hierarchies for loop-based parallel programs via reuse distance analysis , 2012, MSPC '12.
[8] Hyesoon Kim,et al. An analytical model for a GPU architecture with memory-level and thread-level parallelism awareness , 2009, ISCA '09.
[9] Derek L. Schuff,et al. Multicore-aware reuse distance analysis , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing, Workshops and Phd Forum (IPDPSW).
[10] Yang Zhang,et al. Locality based warp scheduling in GPGPUs , 2018, Future Gener. Comput. Syst..
[11] Xipeng Shen,et al. Is Reuse Distance Applicable to Data Locality Analysis on Chip Multiprocessors? , 2010, CC.
[12] Dongwei Wang,et al. A reuse distance based performance analysis on GPU L1 data cache , 2016, 2016 IEEE 35th International Performance Computing and Communications Conference (IPCCC).
[13] Irving L. Traiger,et al. Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..
[14] Henk Corporaal,et al. A detailed GPU cache model based on reuse distance theory , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[15] David Eklov,et al. StatStack: Efficient modeling of LRU caches , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[16] Rachata Ausavarungnirun,et al. Mosaic: A GPU Memory Manager with Application-Transparent Support for Multiple Page Sizes , 2017, 2017 50th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[17] Yutao Zhong,et al. Predicting whole-program locality through reuse distance analysis , 2003, PLDI.
[18] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[19] Kristof Beyls,et al. Reuse Distance as a Metric for Cache Behavior. , 2001 .
[20] Tao Tang,et al. Cache Miss Analysis for GPU Programs Based on Stack Distance Profile , 2011, 2011 31st International Conference on Distributed Computing Systems.
[21] Kyu Yeun Kim,et al. Quantifying the performance and energy efficiency of advanced cache indexing for GPGPU computing , 2016, Microprocess. Microsystems.
[22] Wentao Chang,et al. Sampling-based program locality approximation , 2008, ISMM '08.
[23] C. Cascaval,et al. Calculating stack distances efficiently , 2003, MSP '02.
[24] Onur Mutlu,et al. Improving GPU performance via large warps and two-level warp scheduling , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[25] Donald Yeung,et al. Identifying Power-Efficient Multicore Cache Hierarchies via Reuse Distance Analysis , 2016, TOCS.
[26] David R. Kaeli,et al. Exploiting Memory Access Patterns to Improve Memory Performance in Data-Parallel Architectures , 2011, IEEE Transactions on Parallel and Distributed Systems.
[27] Donald Yeung,et al. Studying multicore processor scaling via reuse distance analysis , 2013, ISCA.
[28] Jungwon Kim,et al. A Performance Model for GPUs with Caches , 2015, IEEE Transactions on Parallel and Distributed Systems.
[29] Franz Franchetti,et al. Accelerating Architectural Simulation Via Statistical Techniques: A Survey , 2016, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[30] Karsten Schwan,et al. A framework for dynamically instrumenting GPU compute applications within GPU Ocelot , 2011, GPGPU-4.
[31] Yu Wang,et al. Optimizing Cache Bypassing and Warp Scheduling for GPUs , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[32] Wen-mei W. Hwu,et al. Efficient performance evaluation of memory hierarchy for highly multithreaded graphics processors , 2012, PPoPP '12.
[33] Chao Li,et al. A model-driven approach to warp/thread-block level GPU cache bypassing , 2016, 2016 53nd ACM/EDAC/IEEE Design Automation Conference (DAC).
[34] YeungDonald,et al. Efficient Reuse Distance Analysis of Multicore Scaling for Loop-Based Parallel Programs , 2013 .
[35] Ali Akoglu,et al. Application-Specific Autonomic Cache Tuning for General Purpose GPUs , 2017, 2017 International Conference on Cloud and Autonomic Computing (ICCAC).
[36] Donald Yeung,et al. Coherent Profiles: Enabling Efficient Reuse Distance Analysis of Multicore Scaling for Loop-based Parallel Programs , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[37] Milind Kulkarni,et al. Accelerating multicore reuse distance analysis with sampling and parallelization , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[38] Amir Rajabzadeh,et al. VLAG: A very fast locality approximation model for GPU kernels with regular access patterns , 2017, 2017 7th International Conference on Computer and Knowledge Engineering (ICCKE).