Translation-Triggered Prefetching
暂无分享,去创建一个
[1] M. Frans Kaashoek,et al. RadixVM: scalable address spaces for multithreaded applications , 2013, EuroSys '13.
[2] Norman P. Jouppi,et al. Rethinking DRAM design and organization for energy-constrained multi-cores , 2010, ISCA.
[3] Osman S. Unsal,et al. Energy-efficient address translation , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[4] Onur Mutlu,et al. The Blacklisting Memory Scheduler: Achieving high performance and fairness at low cost , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).
[5] Onur Mutlu,et al. Research Problems and Opportunities in Memory Systems , 2014, Supercomput. Front. Innov..
[6] Jayneel Gandhi,et al. Efficient Memory Virtualization , 2016 .
[7] O Seongil,et al. Reducing memory access latency with asymmetric DRAM bank organizations , 2013, ISCA.
[8] Onur Mutlu,et al. Gather-Scatter DRAM: In-DRAM address translation to improve the spatial locality of non-unit strided accesses , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[9] William J. Dally,et al. Memory access scheduling , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[10] Ján Veselý,et al. Large pages and lightweight memory management in virtualized environments: Can you have it both ways? , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[11] Srinivas Devadas,et al. IMP: Indirect memory prefetcher , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[12] Kang G. Shin,et al. Design and Implementation of Power-Aware Virtual Memory , 2003, USENIX ATC, General Track.
[13] Rachata Ausavarungnirun,et al. RowClone: Fast and energy-efficient in-DRAM bulk data copy and initialization , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[14] James E. Smith,et al. Fair Queuing Memory Systems , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[15] Abhishek Bhattacharjee,et al. Large-reach memory management unit caches , 2013, MICRO.
[16] Bruce Jacob,et al. The Memory System: You Can't Avoid It, You Can't Ignore It, You Can't Fake It , 2009, The Memory System: You Can't Avoid It, You Can't Ignore It, You Can't Fake It.
[17] Seth H. Pugsley,et al. Efficiently prefetching complex address patterns , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[18] Dirk Grunwald,et al. A stateless, content-directed data prefetching mechanism , 2002, ASPLOS X.
[19] Mor Harchol-Balter,et al. ATLAS : A Scalable and High-Performance Scheduling Algorithm for Multiple Memory Controllers , 2010 .
[20] Onur Mutlu,et al. Tiered-latency DRAM: A low latency and low cost DRAM architecture , 2013, 2013 IEEE 19th International Symposium on High Performance Computer Architecture (HPCA).
[21] David W. Nellans,et al. Micro-pages: increasing DRAM efficiency with locality-aware data placement , 2010, ASPLOS XV.
[22] Margaret Martonosi,et al. Shared last-level TLBs for chip multiprocessors , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[23] R. Govindarajan,et al. Multiple sub-row buffers in DRAM: unlocking performance and energy improvement opportunities , 2012, ICS '12.
[24] Margaret Martonosi,et al. TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs , 2013, TACO.
[25] Onur Mutlu,et al. BLISS: Balancing Performance, Fairness and Complexity in Memory Access Scheduling , 2016, IEEE Transactions on Parallel and Distributed Systems.
[26] Michael M. Swift,et al. Reducing memory reference energy with opportunistic virtual caching , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[27] Onur Mutlu,et al. Memory scaling: A systems architecture perspective , 2013, 2013 5th IEEE International Memory Workshop.
[28] Rachata Ausavarungnirun,et al. Row buffer locality aware caching policies for hybrid memories , 2012, 2012 IEEE 30th International Conference on Computer Design (ICCD).
[29] M. Frans Kaashoek,et al. Scalable address spaces using RCU balanced trees , 2012, ASPLOS XVII.
[30] Abhishek Bhattacharjee,et al. Efficient Address Translation for Architectures with Multiple Page Sizes , 2017, ASPLOS.
[31] Onur Mutlu,et al. Parallelism-Aware Batch Scheduling: Enhancing both Performance and Fairness of Shared DRAM Systems , 2008, 2008 International Symposium on Computer Architecture.
[32] Margaret Martonosi,et al. COATCheck: Verifying Memory Ordering at the Hardware-OS Interface , 2016, ASPLOS.
[33] Uri C. Weiser,et al. Loop-Aware Memory Prefetching Using Code Block Working Sets , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[34] Gregory F. Snyder,et al. The illustris simulation: Public data release , 2015, Astron. Comput..
[35] Alan L. Cox,et al. Practical, transparent operating system support for superpages , 2002, OPSR.
[36] Osman S. Unsal,et al. Redundant Memory Mappings for fast access to large memories , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[37] Chia-Lin Yang,et al. Improving DRAM latency with dynamic asymmetric subarray , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[38] Natalie D. Enright Jerger,et al. Achieving predictable performance through better memory controller placement in many-core CMPs , 2009, ISCA '09.
[39] Onur Mutlu,et al. ChargeCache: Reducing DRAM latency by exploiting row access locality , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[40] Onur Mutlu,et al. Low-Cost Inter-Linked Subarrays (LISA): Enabling fast inter-subarray data movement in DRAM , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[41] Uri C. Weiser,et al. Semantic locality and context-based prefetching using reinforcement learning , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[42] Alan L. Cox,et al. Translation caching: skip, don't walk (the page table) , 2010, ISCA.
[43] Onur Mutlu,et al. A case for exploiting subarray-level parallelism (SALP) in DRAM , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[44] Thomas F. Wenisch,et al. CoScale: Coordinating CPU and Memory System DVFS in Server Systems , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[45] David W. Nellans,et al. Prediction Based DRAM Row-Buffer Management in the Many-Core Era , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[46] Alan L. Cox,et al. SpecTLB: A mechanism for speculative address translation , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[47] Ján Veselý,et al. Observations and opportunities in architecting shared virtual memory for heterogeneous systems , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[48] Gabriel H. Loh,et al. Increasing TLB reach by exploiting clustering in page translations , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[49] Srilatha Manne,et al. Accelerating two-dimensional page walks for virtualized systems , 2008, ASPLOS.
[50] Onur Mutlu,et al. Improving DRAM performance by parallelizing refreshes with accesses , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[51] Margaret Martonosi,et al. Inter-Core Cooperative TLB Prefetchers for Chip Multiprocessors , 2010, ASPLOS 2010.
[52] Aamer Jaleel,et al. CoLT: Coalesced Large-Reach TLBs , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[53] Xin Tong,et al. Prediction-based superpage-friendly TLB designs , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[54] Michael M. Swift,et al. Efficient virtual memory for big memory servers , 2013, ISCA.
[55] Zhimin Zhang,et al. RBPP: A row based DRAM page policy for the many-core era , 2014, 2014 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS).
[56] Qingyuan Deng,et al. MemScale: active low-power modes for main memory , 2011, ASPLOS XVI.