Efficient GPU synchronization without scopes: Saying no to complex consistency models
暂无分享,去创建一个
[1] M. Hill,et al. Weak ordering-a new definition , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[2] Milo M. K. Martin,et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.
[3] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[4] Kevin Skadron,et al. Rodinia: A benchmark suite for heterogeneous computing , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[5] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[6] Niraj K. Jha,et al. GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[7] Aaftab Munshi,et al. The OpenCL specification , 2009, 2009 IEEE Hot Chips 21 Symposium (HCS).
[8] Kevin Skadron,et al. A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).
[9] Memory models , 2010, Commun. ACM.
[10] Sarita V. Adve,et al. DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[11] John D. Owens,et al. Efficient Synchronization Primitives for GPUs , 2011, ArXiv.
[12] Wen-mei W. Hwu,et al. Parboil: A Revised Benchmark Suite for Scientific and Commercial Throughput Computing , 2012 .
[13] Keshav Pingali,et al. A quantitative study of irregular programs on GPUs , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).
[14] Sarita V. Adve,et al. DeNovoND: efficient hardware support for disciplined non-determinism , 2013, ASPLOS '13.
[15] Ganesh Gopalakrishnan,et al. Towards shared memory consistency models for GPUs , 2013, ICS '13.
[16] Nam Sung Kim,et al. GPUWattch: enabling energy optimizations in GPGPUs , 2013, ISCA.
[17] Daniel J. Sorin,et al. Evaluating cache coherent shared virtual memory for heterogeneous multicore chips , 2013, 2013 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[18] David A. Wood,et al. Heterogeneous system coherence for integrated CPU-GPU systems , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[19] Daniel J. Sorin,et al. Exploring memory consistency for massively-threaded throughput-oriented processors , 2013, ISCA.
[20] Mike O'Connor,et al. Cache coherence for GPU architectures , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[21] Kevin Skadron,et al. Pannotia: Understanding irregular GPGPU graph applications , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).
[22] David A. Wood,et al. Heterogeneous-race-free memory models , 2014, ASPLOS.
[23] Christopher Batten,et al. Accelerating Irregular Algorithms on GPGPUs Using Fine-Grain Hardware Worklists , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[24] David A. Wood,et al. QuickRelease: A throughput-oriented approach to release consistency on GPUs , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[25] Tyler Sorensen,et al. ICS: U: Towards Shared Memory Consistency Models for GPUs , 2014 .
[26] Hans-Juergen Boehm,et al. Outlawing ghosts: avoiding out-of-thin-air results , 2014, MSPC@PLDI.
[27] Derek Hower,et al. HRF-Relaxed: Adapting HRF to the Complexities of Industrial Heterogeneous Memory Models , 2015, TACO.
[28] Snehasish Kumar,et al. Fusion: Design tradeoffs in coherent cache hierarchies for accelerators , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[29] Sarita V. Adve,et al. Stash: Have your scratchpad and cache it too , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[30] Sarita V. Adve,et al. DeNovoSync: Efficient Support for Arbitrary Synchronization without Writer-Initiated Invalidations , 2015, ASPLOS.
[31] Ganesh Gopalakrishnan,et al. GPU Concurrency: Weak Behaviours and Programming Assumptions , 2015, ASPLOS.
[32] David A. Wood,et al. Synchronization Using Remote-Scope Promotion , 2015, ASPLOS.