Cache topology aware computation mapping for multicores
暂无分享,去创建一个
Mahmut T. Kandemir | Mary Jane Irwin | Yuanrui Zhang | Taylan Yemliha | Shekhar Srikantaiah | Sai Prashanth Muralidhara
[1] Kathryn S. McKinley,et al. A Parametrized Loop Fusion Algorithm for Improving Parallelism and Cache Locality , 1997, Comput. J..
[2] Srihari Makineni,et al. Communist, Utilitarian, and Capitalist cache policies on CMPs: Caches as a shared resource , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[3] Jihong Kim,et al. A reusability-aware cache memory sharing technique for high-performance low-power CMPs with private L2 caches , 2007, Proceedings of the 2007 international symposium on Low power electronics and design (ISLPED '07).
[4] Yves Robert,et al. Mapping and load-balancing iterative computations , 2004, IEEE Transactions on Parallel and Distributed Systems.
[5] Jichuan Chang,et al. Cooperative cache partitioning for chip multiprocessors , 2007, ICS '07.
[6] Xipeng Shen,et al. Does cache sharing on modern CMP matter to the performance of contemporary multithreaded programs? , 2010, PPoPP '10.
[7] G. Edward Suh,et al. Dynamic Partitioning of Shared Cache Memory , 2004, The Journal of Supercomputing.
[8] Mahmut T. Kandemir,et al. Compiler-directed channel allocation for saving power in on-chip networks , 2006, POPL '06.
[9] Milo M. K. Martin,et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.
[10] Won-Taek Lim,et al. Architectural support for operating system-driven CMP cache management , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[11] Frédéric Vivien,et al. Scheduling the Computations of a Loop Nest with Respect to a Given Mapping , 2000, Euro-Par.
[12] Evangelos P. Markatos,et al. Using processor affinity in loop scheduling on shared-memory multiprocessors , 1992, Supercomputing '92.
[13] Mahmut T. Kandemir,et al. Optimizing shared cache behavior of chip multiprocessors , 2009, 2009 42nd Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[14] Paul Feautrier,et al. Scalable and Structured Scheduling , 2006, International Journal of Parallel Programming.
[15] S. Kim,et al. Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[16] Monica S. Lam,et al. Automatic computation and data decomposition for multiprocessors , 1997 .
[17] David A. Wood,et al. ASR: Adaptive Selective Replication for CMP Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[18] Dean M. Tullsen,et al. Compiler Techniques for Reducing Data Cache Miss Rate on a Multithreaded Architecture , 2008, HiPEAC.
[19] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[20] Erik Brockmeyer,et al. Data Access and Storage Management for Embedded Programmable Processors , 2002, Springer US.
[21] Roberto Bagnara,et al. The Parma Polyhedra Library: Toward a complete set of numerical abstractions for the analysis and verification of hardware and software systems , 2006, Sci. Comput. Program..
[22] John L. Henning. SPEC CPU2006 benchmark descriptions , 2006, CARN.
[23] Engin Ipek,et al. Coordinated management of multiple interacting resources in chip multiprocessors: A machine learning approach , 2008, 2008 41st IEEE/ACM International Symposium on Microarchitecture.
[24] Doug Burger,et al. An adaptive, non-uniform cache structure for wire-delay dominated on-chip caches , 2002, ASPLOS X.
[25] Jichuan Chang,et al. Cooperative Caching for Chip Multiprocessors , 2006, 33rd International Symposium on Computer Architecture (ISCA'06).
[26] David Gay,et al. Lightweight annotations for controlling sharing in concurrent data structures , 2009, PLDI '09.
[27] Frank Vahid,et al. Configurable cache subsetting for fast cache tuning , 2006, 2006 43rd ACM/IEEE Design Automation Conference.
[28] Mahmut T. Kandemir,et al. Organizing the last line of defense before hitting the memory wall for CMPs , 2004, 10th International Symposium on High Performance Computer Architecture (HPCA'04).
[29] Lixin Zhang,et al. Adaptive mechanisms and policies for managing cache hierarchies in chip multiprocessors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[30] Fredrik Larsson,et al. Simics: A Full System Simulation Platform , 2002, Computer.
[31] Pradeep Dubey,et al. Platform 2015: Intel ® Processor and Platform Evolution for the Next Decade , 2005 .
[32] Guy E. Blelloch,et al. Scheduling threads for constructive cache sharing on CMPs , 2007, SPAA '07.
[33] Yale N. Patt,et al. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[34] Krste Asanovic,et al. Victim replication: maximizing capacity while hiding wire delay in tiled chip multiprocessors , 2005, 32nd International Symposium on Computer Architecture (ISCA'05).
[35] William Pugh,et al. The Omega Library interface guide , 1995 .
[36] Chen Ding,et al. A hierarchical model of data locality , 2006, POPL '06.
[37] Hui Li,et al. Locality and Loop Scheduling on NUMA Multiprocessors , 1993, 1993 International Conference on Parallel Processing - ICPP'93.
[38] Yan Solihin,et al. QoS policies and architecture for cache/memory in CMP platforms , 2007, SIGMETRICS '07.
[39] Mahmut T. Kandemir,et al. Adaptive set pinning: managing shared caches in chip multiprocessors , 2008, ASPLOS.
[40] David A. Wood,et al. Managing Wire Delay in Large Chip-Multiprocessor Caches , 2004, 37th International Symposium on Microarchitecture (MICRO-37'04).