Latency reduction techniques in chip multiprocessor cache systems
暂无分享,去创建一个
[1] Michael Zhang,et al. Victim Migration: Dynamically Adapting Between Private and Shared CMP Caches , 2005 .
[2] D. Lenoski,et al. The SGI Origin: A ccnuma Highly Scalable Server , 1997, Conference Proceedings. The 24th Annual International Symposium on Computer Architecture.
[3] Hugh Garraway. Parallel Computer Architecture: A Hardware/Software Approach , 1999, IEEE Concurrency.
[4] Anant Agarwal,et al. APRIL: a processor architecture for multiprocessing , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[5] Lixin Zhang,et al. Adaptive mechanisms and policies for managing cache hierarchies in chip multiprocessors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[6] Fredrik Larsson,et al. Simics: A Full System Simulation Platform , 2002, Computer.
[7] P. Stenstrom. A survey of cache coherence schemes for multiprocessors , 1990, Computer.
[8] Michael Zhang,et al. Victim Replication: Maximizing Capacity while Hiding Wire Delay in Tiled Chip Multiprocessors , 2005, ISCA 2005.
[9] Janak H. Patel,et al. A low-overhead coherence solution for multiprocessors with private cache memories , 1998, ISCA '98.
[10] Pradip Bose,et al. Optimizing pipelines for power and performance , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..
[11] Donald Yeung,et al. The MIT Alewife Machine , 1999, Proc. IEEE.
[12] Jean-Loup Baer,et al. An effective on-chip preloading scheme to reduce data access penalty , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[13] Richard E. Kessler,et al. The Alpha 21264 microprocessor , 1999, IEEE Micro.
[14] Mark Horowitz,et al. An evaluation of directory schemes for cache coherence , 1998, ISCA '98.
[15] Anoop Gupta,et al. The Stanford Dash multiprocessor , 1992, Computer.
[16] Ken Kennedy,et al. Software prefetching , 1991, ASPLOS IV.
[17] N. Ranganathan,et al. Utilization of Cache Area in On-Chip Multiprocessor , 1999, ISHPC.
[18] Kunle Olukotun,et al. The case for a single-chip multiprocessor , 1996, ASPLOS VII.
[19] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[20] John B. Carter,et al. An argument for simple COMA , 1995, Future Gener. Comput. Syst..
[21] Anoop Gupta,et al. Tolerating Latency Through Software-Controlled Prefetching in Shared-Memory Multiprocessors , 1991, J. Parallel Distributed Comput..
[22] Stein Gjessing,et al. Distributed-directory scheme: scalable coherent interface , 1990, Computer.
[23] Eric Sprangle,et al. Increasing processor performance by implementing deeper pipelines , 2002, ISCA.
[24] Roland E. Wunderlich,et al. SMARTS: accelerating microarchitecture simulation via rigorous statistical sampling , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..
[25] Zeshan Chishti,et al. Optimizing replication, communication, and capacity allocation in CMPs , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[26] Paul Feautrier,et al. A New Solution to Coherence Problems in Multicache Systems , 1978, IEEE Transactions on Computers.
[27] Doug Burger,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.
[28] Anoop Gupta,et al. Interleaving: a multithreading technique targeting multiprocessors and workstations , 1994, ASPLOS VI.
[29] Alaa R. Alameldeen,et al. Addressing Workload Variability in Architectural Simulations , 2003, IEEE Micro.
[30] Mark D. Hill,et al. An evaluation of directory protocols for medium-scale shared-memory multiprocessors , 1994, ICS '94.
[31] Luiz André Barroso,et al. Piranha: a scalable architecture based on single-chip multiprocessing , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[32] Alan Jay Smith,et al. A class of compatible cache consistency protocols and their support by the IEEE futurebus , 1986, ISCA '86.
[33] Krste Asanovic,et al. Accelerating Multiprocessor Simulation with a Memory Timestamp Record , 2005, IEEE International Symposium on Performance Analysis of Systems and Software, 2005. ISPASS 2005..
[34] Pat Conway,et al. The AMD Opteron Processor for Multiprocessor Servers , 2003, IEEE Micro.
[35] Brad Calder,et al. Automatically characterizing large scale program behavior , 2002, ASPLOS X.
[36] Rohit Bhatia,et al. Montecito: a dual-core, dual-thread Itanium processor , 2005, IEEE Micro.