Efficient Address Translation for Architectures with Multiple Page Sizes
暂无分享,去创建一个
[1] Gabriel H. Loh,et al. Using TLB Speculation to Overcome Page Splintering in Virtual Machines , 2015 .
[2] Abhishek Bhattacharjee,et al. Translation-Triggered Prefetching , 2017, ASPLOS.
[3] Srilatha Manne,et al. Accelerating two-dimensional page walks for virtualized systems , 2008, ASPLOS.
[4] Vivien Quéma,et al. Large Pages May Be Harmful on NUMA Systems , 2014, USENIX Annual Technical Conference.
[5] Alan L. Cox,et al. Practical, transparent operating system support for superpages , 2002, OPSR.
[6] Margaret Martonosi,et al. Shared last-level TLBs for chip multiprocessors , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[7] Stephen W. Keckler,et al. Page Placement Strategies for GPUs within Heterogeneous Memory Systems , 2015, ASPLOS.
[8] Abhishek Bhattacharjee,et al. Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces , 2014, ASPLOS.
[9] Guang R. Gao,et al. An energy efficient TLB design methodology , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..
[10] Tomás Lang,et al. Reducing TLB power requirements , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.
[11] M. Frans Kaashoek,et al. Scalable address spaces using RCU balanced trees , 2012, ASPLOS XVII.
[12] No License,et al. Intel ® 64 and IA-32 Architectures Software Developer ’ s Manual Volume 3 A : System Programming Guide , Part 1 , 2006 .
[13] Margaret Martonosi,et al. COATCheck: Verifying Memory Ordering at the Hardware-OS Interface , 2016, ASPLOS.
[14] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[15] Michael M. Swift,et al. Agile Paging: Exceeding the Best of Nested and Shadow Paging , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[16] Osman S. Unsal,et al. Energy-efficient address translation , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[17] François Bodin,et al. Skewed associativity enhances performance predictability , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[18] Jayneel Gandhi,et al. Efficient Memory Virtualization , 2016 .
[19] Mark D. Hill,et al. Tradeoffs in supporting two page sizes , 1992, ISCA '92.
[20] Aamer Jaleel,et al. CoLT: Coalesced Large-Reach TLBs , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[21] Mark D. Hill,et al. Surpassing the TLB performance of superpages with less operating system support , 1994, ASPLOS VI.
[22] Fei Guo,et al. Proactively Breaking Large Pages to Improve Memory Overcommitment Performance in VMware ESXi , 2015, VEE.
[23] Abhishek Bhattacharjee,et al. Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces , 2013, ASPLOS.
[24] Kevin Skadron,et al. A characterization of the Rodinia benchmark suite with comparison to contemporary CMP workloads , 2010, IEEE International Symposium on Workload Characterization (IISWC'10).
[25] Michael M. Swift,et al. Reducing memory reference energy with opportunistic virtual caching , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[26] Mahmut T. Kandemir,et al. Generating physical addresses directly for saving instruction TLB energy , 2002, MICRO.
[27] Thomas F. Wenisch,et al. Unlocking bandwidth for GPUs in CC-NUMA systems , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[28] Osman S. Unsal,et al. Redundant Memory Mappings for fast access to large memories , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[29] Ján Veselý,et al. Observations and opportunities in architecting shared virtual memory for heterogeneous systems , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[30] David A. Wood,et al. Supporting x86-64 address translation for 100s of GPU lanes , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[31] Gabriel H. Loh,et al. Increasing TLB reach by exploiting clustering in page translations , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[32] Daniel J. Sorin,et al. Specifying and dynamically verifying address translation-aware memory consistency , 2010, ASPLOS XV.
[33] Babak Falsafi,et al. Clearing the clouds: a study of emerging scale-out workloads on modern hardware , 2012, ASPLOS XVII.
[34] André Seznec,et al. A case for two-way skewed-associative caches , 1993, ISCA '93.
[35] Christoforos E. Kozyrakis,et al. The ZCache: Decoupling Ways and Associativity , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[36] Abhishek Bhattacharjee,et al. Large-reach memory management unit caches , 2013, MICRO.
[37] David W. Nellans,et al. Towards high performance paged memory for GPUs , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[38] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[39] Ján Veselý,et al. Large pages and lightweight memory management in virtualized environments: Can you have it both ways? , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[40] Abhishek Bhattacharjee,et al. Address Translation for Throughput-Oriented Accelerators , 2015, IEEE Micro.
[41] Norman P. Jouppi,et al. CACTI 6.0: A Tool to Model Large Caches , 2009 .
[42] Xin Tong,et al. Prediction-based superpage-friendly TLB designs , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[43] Michael M. Swift,et al. Efficient virtual memory for big memory servers , 2013, ISCA.