Using prediction to accelerate coherence protocols
暂无分享,去创建一个
[1] David E. Culler,et al. Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine , 1991, ASPLOS IV.
[2] Sarita V. Adve,et al. Shared Memory Consistency Models: A Tutorial , 1996, Computer.
[3] James R. Larus,et al. Tempest: a substrate for portable parallel programs , 1995, Digest of Papers. COMPCON'95. Technologies for the Information Superhighway.
[4] Yale N. Patt,et al. Alternative implementations of two-level adaptive branch prediction , 1992, ISCA '92.
[5] James R. Larus,et al. Tempest and typhoon: user-level shared memory , 1994, ISCA '94.
[6] Tom Lovett,et al. STiNG: A CC-NUMA Computer System for the Commercial Marketplace , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[7] David E. Culler,et al. Fine-grain parallelism with minimal hardware support: a compiler-controlled threaded abstract machine , 1991, ASPLOS IV.
[8] Anant Agarwal,et al. LimitLESS directories: A scalable cache coherence scheme , 1991, ASPLOS IV.
[9] Mats Brorsson,et al. An adaptive cache coherence protocol optimized for migratory sharing , 1993, ISCA '93.
[10] M. Karplus,et al. CHARMM: A program for macromolecular energy, minimization, and dynamics calculations , 1983 .
[11] Anoop Gupta,et al. Cache Invalidation Patterns in Shared-Memory Multiprocessors , 1992, IEEE Trans. Computers.
[12] V AdveSarita,et al. Shared Memory Consistency Models , 1996 .
[13] M. Hill,et al. Weak ordering-a new definition , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[14] Anoop Gupta,et al. Memory consistency and event ordering in scalable shared-memory multiprocessors , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[15] Babak Falsafi,et al. Coherent Network Interfaces for Fine-Grain Communication , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[16] Shared-Memory Multiprocessors,et al. Cache Invalidation Patterns in , 1992 .
[17] Robert J. Fowler,et al. Adaptive cache coherency for detecting migratory shared data , 1993, ISCA '93.
[18] Anant Agarwal,et al. APRIL: a processor architecture for multiprocessing , 1990, ISCA '90.
[19] Dean M. Tullsen,et al. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[20] Todd C. Mowry,et al. Tolerating latency through software-controlled data prefetching , 1994 .
[21] James R. Larus,et al. Teapot: language support for writing memory coherence protocols , 1996, PLDI '96.
[22] Per Stenström,et al. A compiler algorithm that reduces read latency in ownership-based cache coherence protocols , 1995, International Conference on Parallel Architectures and Compilation Techniques.
[23] David A. Wood,et al. Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[24] Håkan Grahn,et al. Evaluation of a Competitive-Update Cache Coherence Protocol with Migratory Data Detection , 1996, J. Parallel Distributed Comput..
[25] Susan J. Eggers,et al. Reducing false sharing on shared memory multiprocessors through compile time data transformations , 1995, PPOPP '95.
[26] Sarita V. Adve,et al. An evaluation of fine-grain producer-initiated communication in cache-coherent multiprocessors , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.
[27] James R. Larus,et al. Fine-grain access control for distributed shared memory , 1994, ASPLOS VI.
[28] Anna R. Karlin,et al. Competitive snoopy caching , 1986, 27th Annual Symposium on Foundations of Computer Science (sfcs 1986).
[29] Per Stenström,et al. Simple compiler algorithms to reduce ownership overhead in cache coherence protocols , 1994, ASPLOS VI.
[30] James R. Larus,et al. Application-specific protocols for user-level shared memory , 1994, Proceedings of Supercomputing '94.
[31] James R. Larus,et al. Efficient support for irregular applications on distributed-memory machines , 1995, PPOPP '95.
[32] D. Lenoski,et al. The SGI Origin: A ccnuma Highly Scalable Server , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[33] James R. Larus,et al. Wisconsin Wind Tunnel II: a fast, portable parallel architecture simulator , 2000, IEEE Concurr..
[34] Maged M. Michael,et al. Coherence controller architectures for SMP-based CC-NUMA multiprocessors , 1997, ISCA '97.
[35] James E. Smith,et al. A study of branch prediction strategies , 1981, ISCA '98.
[36] Anoop Gupta,et al. Analysis of cache invalidation patterns in multiprocessors , 1989, ASPLOS III.
[37] Wen-mei W. Hwu,et al. Run-time Adaptive Cache Hierarchy Via Reference Analysis , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[38] Wen-mei W. Hwu,et al. Run-time Adaptive Cache Hierarchy Via Reference Analysis , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[39] James R. Larus,et al. Cooperative shared memory: software and hardware for scalable multiprocessor , 1992, ASPLOS V.
[40] Gurindar S. Sohi,et al. High-bandwidth data memory systems for superscalar processors , 1991, ASPLOS IV.
[41] Mark Horowitz,et al. An evaluation of directory schemes for cache coherence , 1998, ISCA '98.
[42] Willy Zwaenepoel,et al. Adaptive software cache management for distributed shared memory architectures , 1990, ISCA '90.
[43] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[44] Josep Torrellas,et al. Distance-adaptive update protocols for scalable shared-memory multiprocessors , 1996, Proceedings. Second International Symposium on High-Performance Computer Architecture.
[45] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[46] Andrew R. Pleszkun,et al. Implementing Precise Interrupts in Pipelined Processors , 1988, IEEE Trans. Computers.
[47] V AdveSarita,et al. Weak orderinga new definition , 1990 .
[48] James R. Larus,et al. Mechanisms for cooperative shared memory , 1993, ISCA '93.
[49] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[50] Alan L. Cox,et al. Software DSM protocols that adapt between single writer and multiple writer , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.
[51] James R. Larus,et al. Cooperative shared memory: software and hardware for scalable multiprocessors , 1993, TOCS.
[52] Doug Burger,et al. Parallelizing appbt for a shared- memory multiprocessor , 1985 .
[53] Mark D. Hill,et al. An evaluation of directory protocols for medium-scale shared-memory multiprocessors , 1994, ICS '94.
[54] Jack L. Lo,et al. Exploiting Choice: Instruction Fetch and Issue on an Implementable Simultaneous Multithreading Processor , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).