Improving Multiprocessor Performance with Coarse-Grain Coherence Tracking
暂无分享,去创建一个
[1] Milo M. K. Martin,et al. Timestamp snooping: an approach for extending SMPs , 2000, ASPLOS.
[2] Jeffrey B. Rothman,et al. The pool of subsectors cache design , 1999, ICS '99.
[3] Balaram Sinharoy,et al. POWER4 system microarchitecture , 2002, IBM J. Res. Dev..
[4] Babak Falsafi,et al. JETTY: filtering snoops for reduced energy consumption in SMP servers , 2001, Proceedings HPCA Seventh International Symposium on High-Performance Computer Architecture.
[5] David A. Wood,et al. Dynamic self-invalidation: reducing coherence overhead in shared-memory multiprocessors , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[6] John S. Liptay,et al. Structural Aspects of the System/360 Model 85 II: The Cache , 1968, IBM Syst. J..
[7] Per Stenström,et al. TLB and snoop energy-reduction using virtual caches in low-power chip-multiprocessors , 2002, ISLPED '02.
[8] Cathy May,et al. The PowerPC Architecture: A Specification for a New Family of RISC Processors , 1994 .
[9] Milo M. K. Martin,et al. Token Coherence: decoupling performance and correctness , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..
[10] Paul F. Reynolds,et al. Isotach Networks , 1997, IEEE Trans. Parallel Distributed Syst..
[11] A. Charlesworth. The Sun Fireplane System Interconnect , 2001, ACM/IEEE SC 2001 Conference (SC'01).
[12] Alan Jay Smith,et al. Experimental evaluation of on-chip microprocessor cache memories , 1984, ISCA 1984.
[13] Mikko H. Lipasti,et al. Power-Efficient Cache Coherence , 2004 .
[14] A. Seznec,et al. Decoupled sectored caches: conciliating low tag implementation cost and low miss ratio , 1994, Proceedings of 21 International Symposium on Computer Architecture.
[15] Mikko H. Lipasti,et al. Precise and Accurate Processor Simulation , 2002 .
[16] André Seznec,et al. Decoupled sectored caches: conciliating low tag implementation cost , 1994, ISCA '94.
[17] Milo M. K. Martin,et al. Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors , 2003, ISCA '03.
[18] Thomas J. LeBlanc,et al. Adjustable block size coherent caches , 1992, ISCA '92.
[19] Alan Jay Smith,et al. A class of compatible cache consistency protocols and their support by the IEEE futurebus , 1986, ISCA '86.
[20] Balaram Sinharoy,et al. IBM Power5 chip: a dual-core multithreaded processor , 2004, IEEE Micro.
[21] Laxmi N. Bhuyan,et al. A dynamic cache sub-block design to reduce false sharing , 1995, Proceedings of ICCD '95 International Conference on Computer Design. VLSI in Computers and Processors.
[22] Milo M. K. Martin,et al. Simulating a $ 2 M Commercial Server on a $ 2 K PC T , 2001 .
[23] Anoop Gupta,et al. Two Techniques to Enhance the Performance of Memory Consistency Models , 1991, ICPP.
[24] Andreas Moshovos. RegionScout: Exploiting Coarse Grain Sharing in Snoop-Based Coherence , 2005, ISCA 2005.