Reducing energy and increasing performance with traffic optimization in many-core systems
暂无分享,去创建一个
[1] David A. Patterson,et al. Computer Architecture, Fifth Edition: A Quantitative Approach , 2011 .
[2] Valentin Puente,et al. SP-NUCA: a cost effective dynamic non-uniform cache architecture , 2008, CARN.
[3] David A. Patterson,et al. Computer Architecture: A Quantitative Approach , 1969 .
[4] Anoop Gupta,et al. The SPLASH-2 programs: characterization and methodological considerations , 1995, ISCA.
[5] Krste Asanovic,et al. Victim replication: maximizing capacity while hiding wire delay in tiled chip multiprocessors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[6] Rajeev Balasubramonian,et al. Dynamic hardware-assisted software-controlled page placement to manage capacity allocation and sharing within large caches , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[7] Babak Falsafi,et al. Reactive NUCA: near-optimal block placement and replication in distributed caches , 2009, ISCA '09.
[8] George Kurian,et al. Graphite: A distributed parallel simulator for multicores , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[9] Zeshan Chishti,et al. Optimizing replication, communication, and capacity allocation in CMPs , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[10] Payman Zarkesh-Ha,et al. Modeling NoC traffic locality and energy consumption with rent's communication probability distribution , 2010, SLIP '10.
[11] Sanjay J. Patel,et al. WayPoint: Scaling coherence to 1000-core architectures , 2010, 2010 19th International Conference on Parallel Architectures and Compilation Techniques (PACT).
[12] Chandra Krintz,et al. Cache-conscious data placement , 1998, ASPLOS VIII.
[13] Mahmut T. Kandemir,et al. A novel migration-based NUCA design for Chip Multiprocessors , 2008, 2008 SC - International Conference for High Performance Computing, Networking, Storage and Analysis.
[14] Sangyeun Cho,et al. Managing Distributed, Shared L2 Caches through OS-Level Page Allocation , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[15] Jaehyuk Huh,et al. A NUCA Substrate for Flexible CMP Cache Sharing , 2007, IEEE Transactions on Parallel and Distributed Systems.
[16] Mainak Chaudhuri. PageNUCA: Selected policies for page-grain locality management in large shared chip-multiprocessor caches , 2009, 2009 IEEE 15th International Symposium on High Performance Computer Architecture.
[17] Andrew B. Kahng,et al. ORION 2.0: A fast and accurate NoC power and area model for early-stage design space exploration , 2009, 2009 Design, Automation & Test in Europe Conference & Exhibition.
[18] Sanjay J. Patel,et al. WAYPOINT: scaling coherence to thousand-core architectures , 2010, PACT '10.
[19] Anant Agarwal,et al. Directory-based cache coherence in large-scale multiprocessors , 1990, Computer.