Efficient Management of Cache Accesses to Boost GPGPU Memory Subsystem Performance
暂无分享,去创建一个
[1] Mike O'Connor,et al. Cache-Conscious Wavefront Scheduling , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[2] Yun Liang,et al. An efficient compiler framework for cache bypassing on GPUs , 2013, ICCAD 2013.
[3] Yu Wang,et al. Optimizing Cache Bypassing and Warp Scheduling for GPUs , 2018, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[4] Scott A. Mahlke,et al. Mascar: Speeding up GPU warps by reducing memory pitstops , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[5] Mattan Erez,et al. A locality-aware memory hierarchy for energy-efficient GPU architectures , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[6] Jianfei Wang,et al. Incorporating selective victim cache into GPGPU for high‐performance computing , 2017, Concurr. Comput. Pract. Exp..
[7] Julio Sahuquillo,et al. Improving GPU Cache Hierarchy Performance with a Fetch and Replacement Cache , 2018, Euro-Par.
[8] Xinxin Mei,et al. Dissecting GPU Memory Hierarchy Through Microbenchmarking , 2015, IEEE Transactions on Parallel and Distributed Systems.
[9] Song Huang,et al. On the energy efficiency of graphics processing units for scientific computing , 2009, 2009 IEEE International Symposium on Parallel & Distributed Processing.
[10] Jizhou Sun,et al. Elastic-Cache: GPU Cache Architecture for Efficient Fine- and Coarse-Grained Cache-Line Management , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[11] David R. Kaeli,et al. Multi2Sim: A simulation framework for CPU-GPU computing , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[12] Henk Corporaal,et al. A detailed GPU cache model based on reuse distance theory , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[13] Jose Renau,et al. An energy efficient GPGPU memory hierarchy with tiny incoherent caches , 2013, International Symposium on Low Power Electronics and Design (ISLPED).
[14] Won Woo Ro,et al. APRES: Improving Cache Efficiency by Exploiting Load Characteristics on GPUs , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[15] Jung Ho Ahn,et al. A Comprehensive Memory Modeling Tool and Its Application to the Design and Analysis of Future Memory Hierarchies , 2008, 2008 International Symposium on Computer Architecture.
[16] Shuaiwen Song,et al. Locality-Driven Dynamic GPU Cache Bypassing , 2015, ICS.
[17] Mohammad Arjomand,et al. Architecting the Last-Level Cache for GPUs using STT-RAM Technology , 2015, ACM Trans. Design Autom. Electr. Syst..
[18] Kevin Skadron,et al. Pannotia: Understanding irregular GPGPU graph applications , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).
[19] Daniel W. Chang,et al. Studying Victim Caches in GPUs , 2018, 2018 26th Euromicro International Conference on Parallel, Distributed and Network-based Processing (PDP).
[20] Kyu Yeun Kim,et al. IACM: Integrated adaptive cache management for high-performance and energy-efficient GPGPU computing , 2016, 2016 IEEE 34th International Conference on Computer Design (ICCD).
[21] Tao Zhang,et al. Energy-Efficient eDRAM-Based On-Chip Storage Architecture for GPGPUs , 2016, IEEE Transactions on Computers.
[22] José Duato,et al. Accurately modeling the on-chip and off-chip GPU memory subsystem , 2017, Future Gener. Comput. Syst..
[23] William J. Dally,et al. Unifying Primary Cache, Scratch, and Register File Memories in a Throughput Processor , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[24] Marco Maggioni,et al. Dissecting the NVIDIA Volta GPU Architecture via Microbenchmarking , 2018, ArXiv.
[25] Zhihua Wang,et al. Orchestrating Cache Management and Memory Scheduling for GPGPU Applications , 2014, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[26] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[27] Sergios Petridis,et al. Performance and energy characterization of high-performance low-cost cornerness detection on GPUs and multicores , 2014, IISA 2014, The 5th International Conference on Information, Intelligence, Systems and Applications.
[28] Margaret Martonosi,et al. MRPB: Memory request prioritization for massively parallel processors , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[29] Xuhao Chen,et al. Adaptive Cache Management for Energy-Efficient GPU Computing , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[30] Qin Wang,et al. IBOM: An Integrated and Balanced On-Chip Memory for High Performance GPGPUs , 2018, IEEE Transactions on Parallel and Distributed Systems.
[31] Daniel A. Jiménez,et al. Adaptive GPU cache bypassing , 2015, GPGPU@PPoPP.