暂无分享,去创建一个
[1] Sarita V. Adve,et al. Revisiting the Complexity of Hardware Cache Coherence and Some Implications , 2014, ACM Trans. Archit. Code Optim..
[2] Sarita V. Adve,et al. Parallel programming must be deterministic by default , 2009 .
[3] Sarita V. Adve,et al. DeNovoSync: Efficient Support for Arbitrary Synchronization without Writer-Initiated Invalidations , 2015, ASPLOS.
[4] Wenzhi Chen,et al. Efficient Timestamp-Based Cache Coherence Protocol for Many-Core Architectures , 2016, ICS.
[5] Sarita V. Adve,et al. Efficient GPU synchronization without scopes: Saying no to complex consistency models , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[6] Sarita V. Adve,et al. Memory models: a case for rethinking parallel languages and hardware , 2009, PODC '09.
[7] Jeffrey B. Rothman,et al. Sector cache design and performance , 2000, Proceedings 8th International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems (Cat. No.PR00728).
[8] Stefanos Kaxiras,et al. Racer: TSO consistency via race detection , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[9] Milo M. K. Martin,et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.
[10] Chen Tian,et al. PREDATOR: predictive false sharing detection , 2014, PPoPP '14.
[11] Andreas G. Veneris,et al. L-CBF: A Low-Power, Fast Counting Bloom Filter Architecture , 2008, IEEE Transactions on Very Large Scale Integration (VLSI) Systems.
[12] David A. Wood,et al. A Primer on Memory Consistency and Cache Coherence , 2012, Synthesis Lectures on Computer Architecture.
[13] David A. Wood,et al. Lazy release consistency for GPUs , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[14] Sarita V. Adve,et al. DeNovoND: efficient hardware support for disciplined non-determinism , 2013, ASPLOS '13.
[15] Alan L. Cox,et al. A comparison of entry consistency and lazy release consistency implementations , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.
[16] Stefanos Kaxiras,et al. Complexity-effective multicore coherence , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[17] Hans-Juergen Boehm,et al. Foundations of the C++ concurrency memory model , 2008, PLDI '08.
[18] Christoforos E. Kozyrakis,et al. Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[19] Swarnendu Biswas,et al. Rethinking Support for Region Conflict Exceptions , 2019, 2019 IEEE International Parallel and Distributed Processing Symposium (IPDPS).
[20] Alan L. Cox,et al. Lazy release consistency for software distributed shared memory , 1992, ISCA '92.
[21] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[22] Kai Li,et al. Retrospective: virtual memory mapped network interface for the SHRIMP multicomputer , 1994, ISCA '98.
[23] Sarita V. Adve,et al. Spandex: A Flexible Interface for Efficient Heterogeneous Coherence , 2018, 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA).
[24] Jung Ho Ahn,et al. McPAT: An integrated power, area, and timing modeling framework for multicore and manycore architectures , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[25] Burton H. Bloom,et al. Space/time trade-offs in hash coding with allowable errors , 1970, CACM.
[26] David A. Wood,et al. QuickRelease: A throughput-oriented approach to release consistency on GPUs , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[27] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[28] Brandon Lucia,et al. Conflict exceptions: simplifying concurrent language semantics with precise hardware exceptions for data-races , 2010, ISCA.
[29] Janak H. Patel,et al. A low-overhead coherence solution for multiprocessors with private cache memories , 1984, ISCA '84.
[30] Stefanos Kaxiras,et al. SARC Coherence: Scaling Directory Cache Coherence in Performance and Power , 2010, IEEE Micro.
[31] Miguel Castro,et al. Efficient and flexible object sharing , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.
[32] Brian N. Bershad,et al. Midway : shared memory parallel programming with entry consistency for distributed memory multiprocessors , 1991 .
[33] Marcelo Cintra,et al. An OS-based alternative to full hardware coherence on tiled CMPs , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.
[34] Sarita V. Adve,et al. DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[35] Stefanos Kaxiras,et al. Automatic detection of extended data-race-free regions , 2017, 2017 IEEE/ACM International Symposium on Code Generation and Optimization (CGO).
[36] Tanvir Ahmed Khan,et al. Huron: hybrid false sharing detection and repair , 2019, PLDI.
[37] Stefanos Kaxiras,et al. Callback: Efficient synchronization without invalidation with a directory just for spin-waiting , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[38] Alan J. Hu,et al. Protocol verification as a hardware design aid , 1992, Proceedings 1992 IEEE International Conference on Computer Design: VLSI in Computers & Processors.
[39] Dan Grossman,et al. RADISH: Always-on sound and complete race detection in software and hardware , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).