Using prediction to accelerate coherence protocols
暂无分享,去创建一个
[1] M. Karplus,et al. CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .
[2] Parallelizing appbt for a shared- memory multiprocessor , 1985 .
[3] Willy Zwaenepoel,et al. Adaptive software cache management for distributed shared memory architectures , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[4] Anant Agarwal,et al. LimitLESS directories: A scalable cache coherence scheme , 1991, ASPLOS IV.
[5] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[6] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[7] Anoop Gupta,et al. Cache Invalidation Patterns in Shared-Memory Multiprocessors , 1992, IEEE Trans. Computers.
[8] Anoop Gupta,et al. The Stanford Dash multiprocessor , 1992, Computer.
[9] Robert J. Fowler,et al. Adaptive cache coherency for detecting migratory shared data , 1993, ISCA '93.
[10] Mats Brorsson,et al. An adaptive cache coherence protocol optimized for migratory sharing , 1993, ISCA '93.
[11] James R. Larus,et al. Cooperative shared memory: software and hardware for scalable multiprocessors , 1993, TOCS.
[12] Mark D. Hill,et al. An evaluation of directory protocols for medium-scale shared-memory multiprocessors , 1994, ICS '94.
[13] J. Larus,et al. Tempest and Typhoon: user-level shared memory , 1994, Proceedings of 21 International Symposium on Computer Architecture.
[14] Todd C. Mowry,et al. Tolerating latency through software-controlled data prefetching , 1994 .
[15] James R. Larus,et al. Mechanisms for Cooperative Shared Memory , 1994 .
[16] Per Stenström,et al. Simple compiler algorithms to reduce ownership overhead in cache coherence protocols , 1994, ASPLOS VI.
[17] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[18] Per Stenström,et al. A compiler algorithm that reduces read latency in ownership-based cache coherence protocols , 1995, International Conference on Parallel Architectures and Compilation Techniques.
[19] David A. Wood,et al. Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[20] James R. Larus,et al. Efficient support for irregular applications on distributed-memory machines , 1995, PPOPP '95.
[21] Josep Torrellas,et al. Distance-adaptive update protocols for scalable shared-memory multiprocessors , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.
[22] James R. Larus,et al. Teapot: language support for writing memory coherence protocols , 1996, PLDI '96.
[23] Babak Falsafi,et al. Coherent Network Interfaces for Fine-Grain Communication , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[24] Dean M. Tullsen,et al. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[25] Tom Lovett,et al. STiNG: A CC-NUMA Computer System for the Commercial Marketplace , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[26] Sarita V. Adve,et al. Shared Memory Consistency Models: A Tutorial , 1996, Computer.
[27] Alan L. Cox,et al. Software DSM protocols that adapt between single writer and multiple writer , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.
[28] Sarita V. Adve,et al. An evaluation of fine-grain producer-initiated communication in cache-coherent multiprocessors , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.
[29] James Laudon,et al. The SGI Origin: A ccNUMA Highly Scalable Server , 1997, ISCA.
[30] Wen-mei W. Hwu,et al. Run-time Adaptive Cache Hierarchy Via Reference Analysis , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[31] Alternative implementations of two-level adaptive branch prediction , 1993, ISCA '98.
[32] James E. Smith,et al. A study of branch prediction strategies , 1981, ISCA '98.
[33] Mark Horowitz,et al. An evaluation of directory schemes for cache coherence , 1998, ISCA '98.
[34] Lockup-free instruction fetch/prefetch cache organization , 1981, ISCA '98.
[35] James R. Larus,et al. Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator , 2000, IEEE Concurr..