Spandex: A Flexible Interface for Efficient Heterogeneous Coherence
暂无分享,去创建一个
[1] Margaret Martonosi,et al. COATCheck: Verifying Memory Ordering at the Hardware-OS Interface , 2016, ASPLOS.
[2] Keshav Pingali,et al. A quantitative study of irregular programs on GPUs , 2012, 2012 IEEE International Symposium on Workload Characterization (IISWC).
[3] Erik Hagersten,et al. Building Heterogeneous Unified Virtual Memories (UVMs) without the Overhead , 2016, ACM Trans. Archit. Code Optim..
[4] Kevin Skadron,et al. Pannotia: Understanding irregular GPGPU graph applications , 2013, 2013 IEEE International Symposium on Workload Characterization (IISWC).
[5] Christopher Batten,et al. Accelerating Irregular Algorithms on GPGPUs Using Fine-Grain Hardware Worklists , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[6] Thomas F. Wenisch,et al. Selective GPU caches to eliminate CPU-GPU HW cache coherence , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[7] Mike O'Connor,et al. Cache coherence for GPU architectures , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[8] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[9] David A. Wood,et al. Crossing Guard: Mediating Host-Accelerator Coherence Interactions , 2017, ASPLOS.
[10] Milo M. K. Martin,et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.
[11] Derek Hower,et al. HRF-Relaxed: Adapting HRF to the Complexities of Industrial Heterogeneous Memory Models , 2015, TACO.
[12] David A. Wood,et al. Heterogeneous system coherence for integrated CPU-GPU systems , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[13] Sarita V. Adve,et al. DeNovoND: efficient hardware support for disciplined non-determinism , 2013, ASPLOS '13.
[14] Sarita V. Adve,et al. Chasing Away RAts: Semantics and evaluation for relaxed atomics on heterogeneous systems , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[15] Sarita V. Adve,et al. Efficient GPU synchronization without scopes: Saying no to complex consistency models , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[16] Mikko H. Lipasti,et al. Coarse-Grain Coherence Tracking: RegionScout and Region Coherence Arrays , 2006, IEEE Micro.
[17] Jeffrey B. Rothman,et al. Sector cache design and performance , 2000, Proceedings 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (Cat. No.PR00728).
[18] Henry Wong,et al. Analyzing CUDA workloads using a detailed GPU simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[19] Sarita V. Adve,et al. DeNovoSync: Efficient Support for Arbitrary Synchronization without Writer-Initiated Invalidations , 2015, ASPLOS.
[20] Mark D. Hill,et al. Weak ordering—a new definition , 1998, ISCA '98.
[21] Antonio J. Peña,et al. Chai: Collaborative heterogeneous applications for integrated-architectures , 2017, 2017 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[22] Snehasish Kumar,et al. Fusion: Design tradeoffs in coherent cache hierarchies for accelerators , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[23] David A. Wood,et al. Heterogeneous-race-free memory models , 2014, ASPLOS.
[24] Sarita V. Adve,et al. HeteroSync: A benchmark suite for fine-grained synchronization on tightly coupled GPUs , 2017, 2017 IEEE International Symposium on Workload Characterization (IISWC).
[25] David A. Wood,et al. Synchronization Using Remote-Scope Promotion , 2015, ASPLOS.
[26] Niraj K. Jha,et al. GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[27] Jeffrey Stuecheli,et al. CAPI: A Coherent Accelerator Processor Interface , 2015, IBM J. Res. Dev..
[28] Jonathan White,et al. Carrizo: A High Performance, Energy Efficient 28 nm APU , 2016, IEEE Journal of Solid-State Circuits.
[29] David A. Wood,et al. QuickRelease: A throughput-oriented approach to release consistency on GPUs , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[30] Fredrik Larsson,et al. Simics: A Full System Simulation Platform , 2002, Computer.
[31] Josep Torrellas,et al. False Sharing ans Spatial Locality in Multiprocessor Caches , 1994, IEEE Trans. Computers.
[32] Thomas M. Conte,et al. Manager-client pairing: A framework for implementing coherence hierarchies , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[33] Ian Bratt,et al. The ARM® Mali-T880 Mobile GPU , 2015, 2015 IEEE Hot Chips 27 Symposium (HCS).
[34] Margaret Martonosi,et al. ArMOR: Defending against memory consistency model mismatches in heterogeneous architectures , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[35] David A. Wood,et al. Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[36] Sandhya Dwarkadas,et al. Protozoa: adaptive granularity cache coherence , 2013, ISCA.
[37] Sarita V. Adve,et al. DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[38] David A. Wood,et al. Lazy release consistency for GPUs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).