Rethinking TLB designs in virtualized environments: A very large part-of-memory TLB
暂无分享,去创建一个
[1] Mark Oskin,et al. A Software-Managed Approach to Die-Stacked DRAM , 2015, 2015 International Conference on Parallel Architecture and Compilation (PACT).
[2] Margaret Martonosi,et al. Shared last-level TLBs for chip multiprocessors , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[3] Michael M. Swift,et al. Efficient Memory Virtualization: Reducing Dimensionality of Nested Page Walks , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[4] Aamer Jaleel,et al. CoLT: Coalesced Large-Reach TLBs , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[5] Aamer Jaleel,et al. CAMEO: A Two-Level Memory Organization with Capacity of Main Memory and Flexibility of Hardware-Managed Cache , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[6] Srilatha Manne,et al. Accelerating two-dimensional page walks for virtualized systems , 2008, ASPLOS.
[7] Mark D. Hill,et al. Tradeoffs in supporting two page sizes , 1992, ISCA '92.
[8] Cheng-Chieh Huang,et al. ATCache: Reducing DRAM cache latency via a small SRAM tag cache , 2014, 2014 23rd International Conference on Parallel Architecture and Compilation (PACT).
[9] David Roberts,et al. Heterogeneous memory architectures: A HW/SW approach for mixing die-stacked and off-package memories , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[10] G. Kandiraju,et al. Going the distance for TLB prefetching: an application-driven study , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.
[11] Aamer Jaleel,et al. BEAR: Techniques for mitigating bandwidth bloat in gigascale DRAM caches , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[12] Abhishek Bhattacharjee,et al. Large-reach memory management unit caches , 2013, MICRO.
[13] Ján Veselý,et al. Large pages and lightweight memory management in virtualized environments: Can you have it both ways? , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[14] Margaret Martonosi,et al. Inter-core cooperative TLB for chip multiprocessors , 2010, ASPLOS XV.
[15] Mark D. Hill,et al. Efficiently enabling conventional block sizes for very large die-stacked DRAM caches , 2011, 2011 44th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[16] Osman S. Unsal,et al. Energy-efficient address translation , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[17] Dan Tsafrir,et al. Hash, Don't Cache (the Page Table) , 2016, SIGMETRICS.
[18] Gabriel H. Loh,et al. Using TLB Speculation to Overcome Page Splintering in Virtual Machines , 2015 .
[19] Gabriel H. Loh,et al. A Mostly-Clean DRAM Cache for Effective Hit Speculation and Self-Balancing Dispatch , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[20] Li Zhao,et al. Exploring DRAM cache architectures for CMP server platforms , 2007, 2007 25th International Conference on Computer Design.
[21] R. Govindarajan,et al. Bi-Modal DRAM Cache: Improving Hit Rate, Hit Latency and Bandwidth , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[22] Babak Falsafi,et al. Die-stacked DRAM caches for servers: hit ratio, latency, or bandwidth? have it all with footprint cache , 2013, ISCA.
[23] Lixin Zhang,et al. Enigma: architectural and operating system support for reducing the impact of address translation , 2010, ICS '10.
[24] Anand Sivasubramaniam,et al. Going the distance for TLB prefetching: an application-driven study , 2002, ISCA.
[25] David J. Sager,et al. The microarchitecture of the Pentium 4 processor , 2001 .
[26] Yuan Xie,et al. Simple but Effective Heterogeneous Main Memory with On-Chip Memory Controller Support , 2010, 2010 ACM/IEEE International Conference for High Performance Computing, Networking, Storage and Analysis.
[27] Per Stenström,et al. Recency-based TLB preloading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[28] Alaa R. Alameldeen,et al. Transparent Hardware Management of Stacked DRAM as Part of Memory , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[29] Babak Falsafi,et al. Unison Cache: A Scalable and Effective Die-Stacked DRAM Cache , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[30] Lei Jiang,et al. Die Stacking (3D) Microarchitecture , 2006, 2006 39th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO'06).
[31] Babak Falsafi,et al. Toward Dark Silicon in Servers , 2011, IEEE Micro.
[32] R. Manikantan,et al. Bi-Modal DRAM Cache: A Scalable and Effective Die-Stacked DRAM Cache , 2014, MICRO 2014.
[33] Gabriel H. Loh,et al. Challenges in Heterogeneous Die-Stacked and Off-Chip Memory Systems , 2012 .
[34] Michael M. Swift,et al. Efficient virtual memory for big memory servers , 2013, ISCA.
[35] Gabriel H. Loh,et al. Increasing TLB reach by exploiting clustering in page translations , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[36] Alan L. Cox,et al. Translation caching: skip, don't walk (the page table) , 2010, ISCA.
[37] Onur Mutlu,et al. Ramulator: A Fast and Extensible DRAM Simulator , 2016, IEEE Computer Architecture Letters.
[38] Yan Solihin,et al. CHOP: Integrating DRAM Caches for CMP Server Platforms , 2011, IEEE Micro.
[39] Norman P. Jouppi,et al. CACTI: an enhanced cache access and cycle time model , 1996, IEEE J. Solid State Circuits.
[40] Alan L. Cox,et al. SpecTLB: A mechanism for speculative address translation , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[41] Celal Ozturk,et al. Analyzing and quantifying dynamic program behavior in terms of regularities and patterns , 2013 .
[42] Gabriel H. Loh,et al. Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[43] Abhishek Bhattacharjee,et al. Large-reach memory management unit caches: Coalesced and shared memory management unit caches to accelerate TLB miss handling , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[44] Michael M. Swift,et al. Agile Paging: Exceeding the Best of Nested and Shadow Paging , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[45] Mahmut T. Kandemir,et al. Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.