DyCache: Dynamic Multi-Grain Cache Management for Irregular Memory Accesses on GPU
暂无分享,去创建一个
Sheng Ma | Zhiying Wang | Libo Huang | Hui Guo | Yashuai Lü
[1] Olivier Temam,et al. Data caches for superscalar processors , 1997, ICS '97.
[2] Jizhou Sun,et al. Elastic-Cache: GPU Cache Architecture for Efficient Fine- and Coarse-Grained Cache-Line Management , 2017, 2017 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[3] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[4] Rajesh K. Gupta,et al. Adapting cache line size to application behavior , 1999, ICS '99.
[5] Xuhao Chen,et al. Adaptive Cache Management for Energy-Efficient GPU Computing , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[6] Jeffrey B. Rothman,et al. The pool of subsectors cache design , 1999, ICS '99.
[7] Peter M. Kogge,et al. On the Memory Access Patterns of Supercomputer Applications: Benchmark Selection and Its Implications , 2007, IEEE Transactions on Computers.
[8] J. ContiC.,et al. Structural aspects of the system/360 model 85 , 1968 .
[9] Yale N. Patt,et al. Line Distillation: Increasing Cache Capacity by Filtering Unused Words in Cache Lines , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[10] Keshav Pingali,et al. A quantitative study of irregular programs on GPUs , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).
[11] Jack W. Davidson,et al. Memory access coalescing: a technique for eliminating redundant memory accesses , 1994, PLDI '94.
[12] Ching-Yung Lin,et al. GraphBIG: understanding graph computing in the context of industrial solutions , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[13] Scott A. Mahlke,et al. WarpPool: Sharing requests with inter-warp coalescing for throughput processors , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[14] Yi Yang,et al. A GPGPU compiler for memory optimization and parallelism management , 2010, PLDI '10.
[15] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[16] André Seznec,et al. Decoupled sectored caches: conciliating low tag implementation cost , 1994, ISCA '94.
[17] Naga K. Govindaraju,et al. Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[18] J. Ramanujam,et al. Automatic C-to-CUDA Code Generation for Affine Programs , 2010, CC.