Runtime-driven shared last-level cache management for task-parallel programs
暂无分享,去创建一个
[1] S. Kim,et al. Fair cache sharing and partitioning in a chip multiprocessor architecture , 2004, Proceedings. 13th International Conference on Parallel Architecture and Compilation Techniques, 2004. PACT 2004..
[2] Guru Venkataramani,et al. MemTracker: Efficient and Programmable Support for Memory Access Monitoring and Debugging , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[3] Yale N. Patt,et al. Utility-Based Cache Partitioning: A Low-Overhead, High-Performance, Runtime Mechanism to Partition Shared Caches , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[4] David Eklov,et al. Reducing Cache Pollution Through Detection and Elimination of Non-Temporal Memory Accesses , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[5] Yan Solihin,et al. Quality of service shared cache management in chip multiprocessor architecture , 2010, TACO.
[6] G. Edward Suh,et al. Dynamic Partitioning of Shared Cache Memory , 2004, The Journal of Supercomputing.
[7] Kristof Beyls,et al. Generating cache hints for improved program efficiency , 2005, J. Syst. Archit..
[8] Aamer Jaleel,et al. Adaptive insertion policies for high performance caching , 2007, ISCA '07.
[9] Stefanos Kaxiras,et al. Cache replacement based on reuse-distance prediction , 2007, 2007 25th International Conference on Computer Design.
[10] Xiaoning Ding,et al. ULCC: a user-level facility for optimizing shared cache performance on multicores , 2011, PPoPP '11.
[11] Alexander Aiken,et al. Legion: Expressing locality and independence with logical regions , 2012, 2012 International Conference for High Performance Computing, Networking, Storage and Analysis.
[12] Mahmut T. Kandemir,et al. Intra-application cache partitioning , 2010, 2010 IEEE International Symposium on Parallel & Distributed Processing (IPDPS).
[13] Jesús Labarta,et al. Handling task dependencies under strided and aliased references , 2010, ICS '10.
[14] Mateo Valero,et al. Improving Cache Management Policies Using Dynamic Reuse Distances , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[15] Jesús Labarta,et al. A dependency-aware task-based programming environment for multi-core architectures , 2008, 2008 IEEE International Conference on Cluster Computing.
[16] Milo M. K. Martin,et al. Multifacet's general execution-driven multiprocessor simulator (GEMS) toolset , 2005, CARN.
[17] Bradley C. Kuszmaul,et al. Cilk: an efficient multithreaded runtime system , 1995, PPOPP '95.
[18] Eduard Ayguadé,et al. Task Superscalar: An Out-of-Order Task Pipeline , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[19] Srihari Makineni,et al. Communist, Utilitarian, and Capitalist cache policies on CMPs: Caches as a shared resource , 2006, 2006 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[20] Aamer Jaleel,et al. Adaptive insertion policies for managing shared caches , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[21] Ravi R. Iyer,et al. CQoS: a framework for enabling QoS in shared caches of CMP platforms , 2004, ICS '04.
[22] Xi Yang,et al. Why nothing matters: the impact of zeroing , 2011, OOPSLA '11.
[23] Per Stenström,et al. Runtime-Guided Cache Coherence Optimizations in Multi-core Architectures , 2014, 2014 IEEE 28th International Parallel and Distributed Processing Symposium.
[24] Arnold L. Rosenberg,et al. Using the compiler to improve cache replacement decisions , 2002, Proceedings.International Conference on Parallel Architectures and Compilation Techniques.
[25] Laszlo A. Belady,et al. A Study of Replacement Algorithms for Virtual-Storage Computer , 1966, IBM Syst. J..
[26] Daniel Aarno,et al. Full-System Simulation from Embedded to High-Performance Systems , 2010 .
[27] Chen Ding,et al. Pacman: program-assisted cache management , 2013, ISMM '13.
[28] Vijay S. Pai,et al. Imbalanced cache partitioning for balanced data-parallel programs , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[29] Alejandro Duran,et al. Ompss: a Proposal for Programming Heterogeneous Multi-Core Architectures , 2011, Parallel Process. Lett..
[30] Zhao Zhang,et al. Soft-OLP: Improving Hardware Cache Performance through Software-Controlled Object-Level Partitioning , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[31] Gurindar S. Sohi,et al. Serialization sets: a dynamic dependence-based parallel execution model , 2009, PPoPP '09.
[32] John Turek,et al. Optimal Partitioning of Cache Memory , 1992, IEEE Trans. Computers.
[33] David Xinliang Li,et al. Automated locality optimization based on the reuse distance of string operations , 2011, International Symposium on Code Generation and Optimization (CGO 2011).
[34] Aamer Jaleel,et al. High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.
[35] Alejandro Duran,et al. The Design of OpenMP Tasks , 2009, IEEE Transactions on Parallel and Distributed Systems.
[36] Dionisios N. Pnevmatikatos,et al. Prefetching and cache management using task lifetimes , 2013, ICS '13.