TSO-CC: Consistency directed cache coherence for TSO
暂无分享,去创建一个
[1] Antonio Robles,et al. Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[2] Sarita V. Adve,et al. DeNovoND: efficient hardware support for disciplined non-determinism , 2013, ASPLOS '13.
[3] Marcelo Cintra,et al. An OS-based alternative to full hardware coherence on tiled CMPs , 2008, 2008 IEEE 14th International Symposium on High Performance Computer Architecture.
[4] Michael F. Spear,et al. NOrec: streamlining STM by abolishing ownership records , 2010, PPoPP '10.
[5] BurgerDoug,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002 .
[6] Sarita V. Adve,et al. Shared Memory Consistency Models: A Tutorial , 1996, Computer.
[7] Anoop Gupta,et al. Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[8] Mark Horowitz,et al. An evaluation of directory schemes for cache coherence , 1998, ISCA '98.
[9] Ricardo Bianchini,et al. Lazy Release Consistency for Hardware-Coherent Multiprocessors , 1995, Proceedings of the IEEE/ACM SC95 Conference.
[10] Leslie Lamport,et al. Time, clocks, and the ordering of events in a distributed system , 1978, CACM.
[11] Paul Feautrier,et al. A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.
[12] Christoforos E. Kozyrakis,et al. SCD: A scalable coherence directory with flexible sharer set encoding , 2012, IEEE International Symposium on High-Performance Comp Architecture.
[13] Alan L. Cox,et al. TreadMarks: Distributed Shared Memory on Standard Workstations and Operating Systems , 1994, USENIX Winter.
[14] Mustaque Ahamad,et al. Implementing and programming causal distributed shared memory , 1991, [1991] Proceedings. 11th International Conference on Distributed Computing Systems.
[15] Doug Burger,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.
[16] Mike O'Connor,et al. Cache coherence for GPU architectures , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[17] Robert H. B. Netzer. Optimal tracing and replay for debugging shared-memory parallel programs , 1993, PADD '93.
[18] Thomas J. Ashby,et al. Software-Based Cache Coherence with Hardware-Assisted Selective Self-Invalidations Using Bloom Filters , 2011, IEEE Transactions on Computers.
[19] David A. Wood,et al. Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[20] Michel Dubois,et al. Correct memory operation of cache-based multiprocessors , 1987, ISCA '87.
[21] Deborah A. Wallach. PHD: A Hierarchical Cache Coherent Protocol , 1992 .
[22] Tianshi Chen,et al. DLS: Directoryless Shared Last-level Cache , 2012, ArXiv.
[23] Anoop Gupta,et al. Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes , 1990, ICPP.
[24] Francesco Zappa Nardelli,et al. x86-TSO , 2010, Commun. ACM.
[25] Gil Neiger,et al. Causal memory: definitions, implementation, and programming , 1995, Distributed Computing.
[26] David A. Wood,et al. A Primer on Memory Consistency and Cache Coherence , 2012, Synthesis Lectures on Computer Architecture.
[27] Seth H. Pugsley,et al. SWEL: Hardware cache coherence protocols to map shared data onto shared caches , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[28] Kunle Olukotun,et al. STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.
[29] Babak Falsafi,et al. Cuckoo directory: A scalable directory for many-core systems , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[30] Alan L. Cox,et al. Lazy release consistency for software distributed shared memory , 1992, ISCA '92.
[31] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[32] Zhiqiang Ma,et al. Ad Hoc Synchronization Considered Harmful , 2010, OSDI.
[33] Kevin Skadron,et al. Design issues and tradeoffs for write buffers , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.
[34] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.
[35] Anoop Gupta,et al. Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, ISCA '90.
[36] Rajiv Gupta,et al. Dynamic recognition of synchronization operations for improved data race detection , 2008, ISSTA '08.
[37] Stefanos Kaxiras,et al. Complexity-effective multicore coherence , 2012, 2012 21st International Conference on Parallel Architectures and Compilation Techniques (PACT).
[38] Sarita V. Adve,et al. DeNovo: Rethinking the Memory Hierarchy for Disciplined Parallelism , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[39] Willy Zwaenepoel,et al. Implementation and performance of Munin , 1991, SOSP '91.
[40] Sang Lyul Min,et al. Design and Analysis of a Scalable Cache Coherence Scheme Based on Clocks and Timestamps , 1992, IEEE Trans. Parallel Distributed Syst..
[41] Rami G. Melhem,et al. A timestamp-based selective invalidation scheme for multiprocessor cache coherence , 1996, Proceedings of the 1996 ICPP Workshop on Challenges for Parallel Processing.
[42] Sandhya Dwarkadas,et al. SPACE: Sharing pattern-based directory coherence for multicore scalability , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[43] Michel Dubois,et al. Delayed consistency and its effects on the miss rate of parallel programs , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[44] Niraj K. Jha,et al. GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[45] Milo M. K. Martin,et al. Why on-chip cache coherence is here to stay , 2012, Commun. ACM.
[46] Michel Dubois,et al. Memory access buffering in multiprocessors , 1998, ISCA '98.
[47] Jade Alglave,et al. Litmus: Running Tests against Hardware , 2011, TACAS.
[48] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[49] Stefanos Kaxiras,et al. SARC Coherence: Scaling Directory Cache Coherence in Performance and Power , 2010, IEEE Micro.