Locality-oblivious cache organization leveraging single-cycle multi-hop NoCs
暂无分享,去创建一个
[1] Anoop Gupta,et al. Reducing Memory and Traffic Requirements for Scalable Directory-Based Cache Coherence Schemes , 1990, ICPP.
[2] Jaehyuk Huh,et al. Exploiting ILP, TLP, and DLP with the polymorphous TRIPS architecture , 2003, ISCA '03.
[3] Sangyeun Cho,et al. Managing Distributed, Shared L2 Caches through OS-Level Page Allocation , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[4] Jaehyuk Huh,et al. A NUCA Substrate for Flexible CMP Cache Sharing , 2007, IEEE Transactions on Parallel and Distributed Systems.
[5] David A. Wood,et al. TLC: Transmission Line Caches , 2003, MICRO.
[6] Timothy Mattson,et al. A 48-Core IA-32 message-passing processor with DVFS in 45nm CMOS , 2010, 2010 IEEE International Solid-State Circuits Conference - (ISSCC).
[7] Michael Zhang,et al. Victim Migration: Dynamically Adapting Between Private and Shared CMP Caches , 2005 .
[8] David A. Wood,et al. ASR: Adaptive Selective Replication for CMP Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[9] Doug Burger,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.
[10] Jichuan Chang,et al. Cooperative Caching for Chip Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[11] Zeshan Chishti,et al. Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures , 2003, MICRO.
[12] Zeshan Chishti,et al. Optimizing replication, communication, and capacity allocation in CMPs , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[13] Milo M. K. Martin,et al. Token Coherence: decoupling performance and correctness , 2003, 30th Annual International Symposium on Computer Architecture, 2003. Proceedings..
[14] Antonio Robles,et al. Increasing the effectiveness of directory caches by deactivating coherence for private memory blocks , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[15] William J. Dally,et al. Microarchitecture of a high radix router , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[16] Hyunjin Lee,et al. CloudCache: Expanding and shrinking private caches , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[17] Krste Asanovic,et al. Victim replication: maximizing capacity while hiding wire delay in tiled chip multiprocessors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[18] Vladimir Stojanovic,et al. Characterization of Equalized and Repeated Interconnects for NoC Applications , 2008, IEEE Design & Test of Computers.
[19] Mainak Chaudhuri. PageNUCA: Selected policies for page-grain locality management in large shared chip-multiprocessor caches , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[20] Norman P. Jouppi,et al. Multi-Core Cache Hierarchies , 2011, Multi-Core Cache Hierarchies.
[21] Chen Sun,et al. DSENT - A Tool Connecting Emerging Photonics with Electronics for Opto-Electronic Networks-on-Chip Modeling , 2012, 2012 IEEE/ACM Sixth International Symposium on Networks-on-Chip.
[22] Niraj K. Jha,et al. In-Network Snoop Ordering (INSO): Snoopy coherence on unordered interconnects , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[23] Alan J. Hu,et al. Improving multiple-CMP systems using token coherence , 2005, 11th International Symposium on High-Performance Computer Architecture.
[24] George Kurian,et al. Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[25] William J. Dally,et al. Flattened butterfly: a cost-efficient topology for high-radix networks , 2007, ISCA '07.
[26] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[27] R. Ho. Chip Wires: Scaling and Efficiency , 2003 .
[28] Mark D. Hill,et al. Virtual hierarchies to support server consolidation , 2007, ISCA '07.
[29] Rajeev Balasubramonian,et al. Dynamic hardware-assisted software-controlled page placement to manage capacity allocation and sharing within large caches , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[30] Milo M. K. Martin,et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.
[31] Anantha Chandrakasan,et al. Approaching the theoretical limits of a mesh NoC with a 16-node chip prototype in 45nm SOI , 2012, DAC Design Automation Conference 2012.
[32] Rajeev Balasubramonian. Multi-Core Cache Hierarchies [Synthesis Lectures on Computer Architecture] pdf - Rajeev Balasubramonian , 2016 .
[33] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[34] Li-Shiuan Peh,et al. Breaking the on-chip latency barrier using SMART , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[35] Henry Hoffmann,et al. On-Chip Interconnection Architecture of the Tile Processor , 2007, IEEE Micro.
[36] BurgerDoug,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002 .
[37] Natalie D. Enright Jerger,et al. Virtual tree coherence: Leveraging regions and in-network multicast trees for scalable cache coherence , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.
[38] Li Shang,et al. Leveraging on-chip networks for data cache migration in chip multiprocessors , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[39] Simon W. Moore,et al. A communication characterisation of Splash-2 and Parsec , 2009, 2009 IEEE International Symposium on Workload Characterization (IISWC).
[40] Deborah A. Wallach. PHD: A Hierarchical Cache Coherent Protocol , 1992 .
[41] Sriram R. Vangal,et al. A 5-GHz Mesh Interconnect for a Teraflops Processor , 2007, IEEE Micro.
[42] Dhiraj K. Pradhan,et al. A hierarchical directory scheme for large-scale cache-coherent multiprocessors , 1992, Proceedings Sixth International Parallel Processing Symposium.
[43] Rami G. Melhem,et al. Dynamic cache clustering for chip multiprocessors , 2009, ICS '09.
[44] Babak Falsafi,et al. Reactive NUCA: near-optimal block placement and replication in distributed caches , 2009, ISCA '09.
[45] Niraj K. Jha,et al. GARNET: A detailed on-chip network model inside a full-system simulator , 2009, 2009 IEEE International Symposium on Performance Analysis of Systems and Software.
[46] Anantha Chandrakasan,et al. SMART: A single-cycle reconfigurable NoC for SoC applications , 2013, 2013 Design, Automation & Test in Europe Conference & Exhibition (DATE).
[47] William J. Dally,et al. The BlackWidow High-Radix Clos Network , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).