Improving First Level Cache Efficiency for GPUs Using Dynamic Line Protection
暂无分享,去创建一个
[1] P. Glaskowsky. NVIDIA ’ s Fermi : The First Complete GPU Computing Architecture , 2009 .
[2] Mike O'Connor,et al. Cache-Conscious Wavefront Scheduling , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[3] Samira Manabi Khan,et al. Sampling Dead Block Prediction for Last-Level Caches , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[4] Kurt Keutzer,et al. A map reduce framework for programming graphics processors , 2010 .
[5] Shuaiwen Song,et al. Locality-Driven Dynamic GPU Cache Bypassing , 2015, ICS.
[6] Mateo Valero,et al. Improving Cache Management Policies Using Dynamic Reuse Distances , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[7] Kurt Keutzer,et al. Fast support vector machine training and classification on graphics processors , 2008, ICML '08.
[8] Mahmut T. Kandemir,et al. Neither more nor less: Optimizing thread-level parallelism for GPGPUs , 2013, Proceedings of the 22nd International Conference on Parallel Architectures and Compilation Techniques.
[9] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[10] Lifan Xu,et al. Auto-tuning a high-level language targeted to GPU codes , 2012, 2012 Innovative Parallel Computing (InPar).
[11] Margaret Martonosi,et al. Characterizing and improving the use of demand-fetched caches in GPUs , 2012, ICS '12.
[12] Daniel A. Jiménez,et al. Adaptive GPU cache bypassing , 2015, GPGPU@PPoPP.
[13] Carole-Jean Wu,et al. Ctrl-C: Instruction-Aware Control Loop Based Adaptive Cache Bypassing for GPUs , 2016, 2016 IEEE 34th International Conference on Computer Design (ICCD).
[14] John Kim,et al. Improving GPGPU resource utilization through alternative thread block scheduling , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[15] Naga K. Govindaraju,et al. Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[16] J. Ramanujam,et al. Automatic C-to-CUDA Code Generation for Affine Programs , 2010, CC.
[17] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[18] Aamer Jaleel,et al. High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.
[19] William J. Dally,et al. The GPU Computing Era , 2010, IEEE Micro.
[20] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[21] Xuhao Chen,et al. Adaptive Cache Management for Energy-Efficient GPU Computing , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[22] Yu Wang,et al. Coordinated static and dynamic cache bypassing for GPUs , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[23] William J. Dally,et al. Energy-efficient mechanisms for managing thread context in throughput processors , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).