Neighborhood-aware data locality optimization for NoC-based multicores
暂无分享,去创建一个
Mahmut T. Kandemir | Yuanrui Zhang | Jun Liu | Taylan Yemliha | M. Kandemir | Yuanrui Zhang | Taylan Yemliha | Jun Liu
[1] W. Dally,et al. Route packets, not wires: on-chip interconnection networks , 2001, Proceedings of the 38th Design Automation Conference (IEEE Cat. No.01CH37232).
[2] William Pugh,et al. The Omega Library interface guide , 1995 .
[3] Uday Bondhugula,et al. Data Layout Transformation for Enhancing Data Locality on NUCA Chip Multiprocessors , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[4] Tulika Mitra,et al. Integrated scratchpad memory optimization and task scheduling for MPSoC architectures , 2006, CASES '06.
[5] Fredrik Larsson,et al. Simics: A Full System Simulation Platform , 2002, Computer.
[6] Max B Aron. The single-chip cloud computer , 2010 .
[7] Zeshan Chishti,et al. Distance Associativity for High-Performance Energy-Efficient Non-Uniform Cache Architectures , 2003, MICRO.
[8] Zeshan Chishti,et al. Optimizing replication, communication, and capacity allocation in CMPs , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[9] David A. Wood,et al. Managing Wire Delay in Large Chip-Multiprocessor Caches , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).
[10] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[11] Francesco Poletti,et al. Communication-aware allocation and scheduling framework for stream-oriented multi-processor systems-on-chip , 2006, Proceedings of the Design Automation & Test in Europe Conference.
[12] Rainer Leupers,et al. A modular simulation framework for spatial and temporal task mapping onto multi-processor SoC platforms , 2005, Design, Automation and Test in Europe.
[13] Evangelos P. Markatos,et al. Using processor affinity in loop scheduling on shared-memory multiprocessors , 1992, Supercomputing '92.
[14] Jim Held. "Single-chip Cloud Computer", an IA Tera-scale Research Processor , 2010, Euro-Par Workshops.
[15] Guy E. Blelloch,et al. Scheduling threads for constructive cache sharing on CMPs , 2007, SPAA '07.
[16] Norman P. Jouppi,et al. Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .
[17] Radu Marculescu,et al. User-Aware Dynamic Task Allocation in Networks-on-Chip , 2008, 2008 Design, Automation and Test in Europe.
[18] Frédéric Pétrot,et al. Comparison of memory write policies for NoC based Multicore Cache Coherent Systems , 2008, 2008 Design, Automation and Test in Europe.
[19] Michael E. Wolf,et al. Combining Loop Transformations Considering Caches and Scheduling , 2004, International Journal of Parallel Programming.
[20] Michael Wolfe,et al. High performance compilers for parallel computing , 1995 .
[21] Ken Kennedy,et al. Optimizing Compilers for Modern Architectures: A Dependence-based Approach , 2001 .
[22] Mahmut T. Kandemir,et al. Application mapping for chip multiprocessors , 2008, 2008 45th ACM/IEEE Design Automation Conference.
[23] Dean M. Tullsen,et al. Compiler Techniques for Reducing Data Cache Miss Rate on a Multithreaded Architecture , 2008, HiPEAC.
[24] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[25] Chen Ding,et al. A hierarchical model of data locality , 2006, POPL '06.
[26] William J. Dally,et al. Flattened Butterfly Topology for On-Chip Networks , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[27] T. N. Vijaykumar,et al. Distance associativity for high-performance energy-efficient non-uniform cache architectures , 2003, Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36..
[28] Xipeng Shen,et al. Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? , 2010, PPoPP '10.
[29] Keith W. Ross,et al. Computer networking - a top-down approach featuring the internet , 2000 .
[30] Mahmut T. Kandemir,et al. Cache topology aware computation mapping for multicores , 2010, PLDI '10.
[31] Mahmut T. Kandemir,et al. Optimizing shared cache behavior of chip multiprocessors , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[32] Vincenzo Catania,et al. Multi-objective mapping for mesh-based NoC architectures , 2004, International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004..
[33] David H. Bailey,et al. The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..
[34] Hyunjin Lee,et al. A flexible data to L2 cache mapping approach for future multicore processors , 2006, MSPC '06.
[35] Doug Burger,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.
[36] Rudolf Eigenmann,et al. SPEComp: A New Benchmark Suite for Measuring Parallel Computer Performance , 2001, WOMPAT.
[37] Scott A. Mahlke,et al. Data Access Partitioning for Fine-grain Parallelism on Multicore Architectures , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[38] R. Pop,et al. Mapping applications to NoC platforms with multithreaded processor resources , 2005, 2005 NORCHIP.
[39] Milo M. K. Martin,et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.
[40] Radu Marculescu,et al. Contention-aware application mapping for Network-on-Chip communication architectures , 2008, 2008 IEEE International Conference on Computer Design.