Integrated Coherence Prediction: Towards Efficient Cache Coherence on NoC-Based Multicore Architectures
暂无分享,去创建一个
[1] Josep Torrellas,et al. Data forwarding in scalable shared-memory multiprocessors , 1995, ICS '95.
[2] Andrew B. Kahng,et al. ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.
[3] Anoop Gupta,et al. The Stanford Dash multiprocessor , 1992, Computer.
[4] Thomas F. Wenisch,et al. Store-ordered streaming of shared memory , 2005, 14th International Conference on Parallel Architectures and Compilation Techniques (PACT'05).
[5] Stefanos Kaxiras,et al. Coherence communication prediction in shared-memory multiprocessors , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[6] Babak Falsafi,et al. Memory sharing predictor: the key to a speculative coherent DSM , 1999, ISCA.
[7] José González,et al. Owner Prediction for Accelerating Cache-to-Cache Transfer Misses in a cc-NUMA Architecture , 2002, ACM/IEEE SC 2002 Conference (SC'02).
[8] Tarek El-Ghazawi,et al. An adaptive cache coherence protocol for chip multiprocessors , 2010, IFMT '10.
[9] Sarita V. Adve,et al. An evaluation of fine-grain producer-initiated communication in cache-coherent multiprocessors , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.
[10] Stefanos Kaxiras,et al. Improving CC-NUMA performance using Instruction-based Prediction , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[11] Jaehyuk Huh,et al. A NUCA Substrate for Flexible CMP Cache Sharing , 2007, IEEE Transactions on Parallel and Distributed Systems.
[12] George Kurian,et al. The locality-aware adaptive cache coherence protocol , 2013, ISCA.
[13] Ronald G. Dreslinski,et al. The M5 Simulator: Modeling Networked Systems , 2006, IEEE Micro.
[14] José González,et al. The use of prediction for accelerating upgrade misses in cc-NUMA multiprocessors , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.
[15] Sangyeun Cho,et al. Predicting Coherence Communication by Tracking Synchronization Points at Run Time , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[16] Nong Xiao,et al. VBON: Toward efficient on-chip networks via hierarchical virtual bus , 2013, Microprocess. Microsystems.
[17] Nong Xiao,et al. An optimized multicore cache coherence design for exploiting communication locality , 2012, GLSVLSI '12.
[18] Richard E. Kessler,et al. Evaluating stream buffers as a secondary cache replacement , 1994, Proceedings of 21 International Symposium on Computer Architecture.
[19] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and pre , 1990, ISCA 1990.
[20] Milo M. K. Martin,et al. Using destination-set prediction to improve the latency/bandwidth tradeoff in shared-memory multiprocessors , 2003, ISCA '03.
[21] Natalie D. Enright Jerger,et al. Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.
[22] Amirali Baniasadi,et al. A Power-Aware Prediction-Based Cache Coherence Protocol for Chip Multiprocessors , 2007, 2007 IEEE International Parallel and Distributed Processing Symposium.
[23] John B. Carter,et al. An Adaptive Cache Coherence Protocol Optimized for Producer-Consumer Sharing , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[24] Mark D. Hill,et al. Using prediction to accelerate coherence protocols , 1998, ISCA.
[25] Michael C. Huang,et al. Improving support for locality and fine-grain sharing in chip multiprocessors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[26] Mario Lodde,et al. Built-in fast gather control network for efficient support of coherence protocols , 2013, IET Comput. Digit. Tech..
[27] T. Lovett,et al. STiNG: A CC-NUMA Computer System for the Commercial Marketplace , 1996, 23rd Annual International Symposium on Computer Architecture (ISCA'96).
[28] D. Lenoski,et al. The SGI Origin: A ccnuma Highly Scalable Server , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[29] Laxmi N. Bhuyan,et al. Switch cache: a framework for improving the remote memory access latency of CC-NUMA multiprocessors , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[30] Alberto Ros,et al. DiCo-CMP: Efficient cache coherency in tiled CMP architectures , 2008, 2008 IEEE International Symposium on Parallel and Distributed Processing.
[31] Hai Zhou,et al. Parallel CAD: Algorithm Design and Programming Special Section Call for Papers TODAES: ACM Transactions on Design Automation of Electronic Systems , 2010 .
[32] Michael L. Scott,et al. The effect of network total order, broadcast, and remote-write capability on network-based shared memory computing , 2000, Proceedings Sixth International Symposium on High-Performance Computer Architecture. HPCA-6 (Cat. No.PR00550).
[33] Norman P. Jouppi,et al. Improving direct-mapped cache performance by the addition of a small fully-associative cache and prefetch buffers , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[34] Jaehyuk Huh,et al. A NUCA substrate for flexible CMP cache sharing , 2005, ICS.
[35] Li-Shiuan Peh,et al. In-network cache coherence , 2006, IEEE Comput. Archit. Lett..
[36] Maged M. Michael,et al. Design and performance of directory caches for scalable shared memory multiprocessors , 1999, Proceedings Fifth International Symposium on High-Performance Computer Architecture.
[37] David L. Dill,et al. The Murphi Verification System , 1996, CAV.
[38] Henry Hoffmann,et al. On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.
[39] Anoop Gupta,et al. The Stanford FLASH multiprocessor , 1994, ISCA '94.
[40] Karthik Ramani,et al. Interconnect-Aware Coherence Protocols for Chip Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[41] Manoj Franklin,et al. Perceptron Based Consumer Prediction in Shared-Memory Multiprocessors , 2006, 2006 International Conference on Computer Design.
[42] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[43] Stefanos Kaxiras,et al. SARC Coherence: Scaling Directory Cache Coherence in Performance and Power , 2010, IEEE Micro.