Quantifying the performance and energy efficiency of advanced cache indexing for GPGPU computing
暂无分享,去创建一个
[1] Xuhao Chen,et al. Adaptive Cache Management for Energy-Efficient GPU Computing , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[2] Onur Mutlu,et al. Improving GPU performance via large warps and two-level warp scheduling , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[3] Margaret Martonosi,et al. MRPB: Memory request prioritization for massively parallel processors , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[4] Yale N. Patt,et al. The V-Way cache: demand-based associativity via global replacement , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[5] Mohamed Zahran,et al. Efficient utilization of GPGPU cache hierarchy , 2015, GPGPU@PPoPP.
[7] Seunghoe Kim,et al. On the Feasibility of Advanced Cache Indexing for High-Performance and Energy-Efficient GPGPU Computing , 2014, MES@ISCA.
[8] John Kim,et al. Improving GPGPU Resource Utilization and Performance Through Alternative Cooperative Thread Array Scheduling , 2014, HPCA 2014.
[9] Collin McCurdy,et al. The Scalable Heterogeneous Computing (SHOC) benchmark suite , 2010, GPGPU-3.
[10] Mike O'Connor,et al. Cache-Conscious Wavefront Scheduling , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[11] Qing Yang,et al. A novel cache design for vector processing , 1992, ISCA '92.
[12] Alberto Ros,et al. Adaptive Selection of Cache Indexing Bits for Removing Conflict Misses , 2015, IEEE Transactions on Computers.
[13] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[14] Weikuan Yu,et al. Eliminating intra-warp conflict misses in GPU , 2015, 2015 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[15] Lifan Xu,et al. Auto-tuning a high-level language targeted to GPU codes , 2012, 2012 Innovative Parallel Computing (InPar).
[16] Jaejin Lee,et al. Using prime numbers for cache indexing to eliminate conflict misses , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[17] B. Ramakrishna Rau,et al. Pseudo-randomly interleaved memory , 1991, ISCA '91.
[18] Christoforos E. Kozyrakis,et al. The ZCache: Decoupling Ways and Associativity , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[19] Yale N. Patt,et al. The V-Way Cache: Demand Based Associativity via Global Replacement , 2005, ISCA 2005.
[20] Jeffrey R. Diamond,et al. Arbitrary Modulus Indexing , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[21] Naga K. Govindaraju,et al. Mars: A MapReduce Framework on graphics processors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[22] Nam Sung Kim,et al. GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.
[23] Gabriel H. Loh,et al. PIPP: promotion/insertion pseudo-partitioning of multi-core shared caches , 2009, ISCA '09.
[24] Henk Corporaal,et al. A detailed GPU cache model based on reuse distance theory , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[25] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[26] Mateo Valero,et al. Eliminating cache conflict misses through XOR-based placement functions , 1997, ICS '97.
[27] José González,et al. The design and performance of a conflict-avoiding cache , 1997, Proceedings of 30th Annual International Symposium on Microarchitecture.