Architecture exploration of recent GPUs to analyze the efficiency of hardware resources
暂无分享,去创建一个
Cheol Hong Kim | Viet Vo | C. Kim | V. Vo
[1] Rami G. Melhem,et al. SAWS: Synchronization aware GPGPU warp scheduling for multiple independent warp schedulers , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[2] Jong-Myon Kim,et al. Dynamic Selective Warp Scheduling for GPUs Using L1 Data Cache Locality Information , 2018, PDCAT.
[3] Mike O'Connor,et al. Cache-Conscious Wavefront Scheduling , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[4] Andreas Moshovos,et al. Demystifying GPU microarchitecture through microbenchmarking , 2010, 2010 IEEE International Symposium on Performance Analysis of Systems & Software (ISPASS).
[5] Kevin Skadron,et al. A performance study of general-purpose applications on graphics processors using CUDA , 2008, J. Parallel Distributed Comput..
[6] Xuhao Chen,et al. Adaptive Cache Management for Energy-Efficient GPU Computing , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[7] Jong-Myon Kim,et al. Application Characteristics-Aware Sporadic Cache Bypassing for high performance GPGPUs , 2018, J. Parallel Distributed Comput..
[8] Cheol Hong Kim,et al. A dynamic CTA scheduling scheme for massive parallel computing , 2017, Cluster Computing.
[9] Jong-Myon Kim,et al. Early miss prediction based periodic cache bypassing for high performance GPUs , 2017, Microprocess. Microsystems.
[10] Wu-chun Feng,et al. To GPU synchronize or not GPU synchronize? , 2010, Proceedings of 2010 IEEE International Symposium on Circuits and Systems.
[11] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[12] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[13] Henk Corporaal,et al. A detailed GPU cache model based on reuse distance theory , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[14] Wu-chun Feng,et al. Inter-block GPU communication via fast barrier synchronization , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[15] Onur Mutlu,et al. Improving GPU performance via large warps and two-level warp scheduling , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[16] Kevin Skadron,et al. A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).
[17] Cheol Hong Kim,et al. Memory Contention Aware Power Management for High Performance GPUs , 2018, PDCAT.
[18] Mahmut T. Kandemir,et al. OWL: cooperative thread array aware scheduling techniques for improving GPGPU performance , 2013, ASPLOS '13.
[19] Tor M. Aamodt,et al. Accel-Sim: An Extensible Simulation Framework for Validated GPU Modeling , 2020, 2020 ACM/IEEE 47th Annual International Symposium on Computer Architecture (ISCA).
[20] Carole-Jean Wu,et al. CAWS: Criticality-aware warp scheduling for GPGPU workloads , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).