Observations and opportunities in architecting shared virtual memory for heterogeneous systems
暂无分享,去创建一个
Ján Veselý | Mark Oskin | Gabriel H. Loh | Abhishek Bhattacharjee | Arkaprava Basu | M. Oskin | Arkaprava Basu | J. Veselý | A. Bhattacharjee | G. Loh
[1] Jeffrey Stuecheli,et al. CAPI: A Coherent Accelerator Processor Interface , 2015, IBM J. Res. Dev..
[2] Florian Dörfler,et al. Hardware-Accelerated Regular Expression Matching with Overlap Handling on IBM PowerEN Processor , 2013, 2013 IEEE 27th International Symposium on Parallel and Distributed Processing.
[3] Gu-Yeon Wei,et al. Toward Cache-Friendly Hardware Accelerators , 2015 .
[4] Osman S. Unsal,et al. Redundant Memory Mappings for fast access to large memories , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[5] Avi Mendelson,et al. DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[6] Rami G. Melhem,et al. Supporting superpages in non-contiguous physical memory , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[7] Abhishek Bhattacharjee,et al. Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces , 2014, ASPLOS.
[8] Alan L. Cox,et al. SpecTLB: A mechanism for speculative address translation , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[9] David A. Wood,et al. Supporting x86-64 address translation for 100s of GPU lanes , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[10] Gabriel H. Loh,et al. Increasing TLB reach by exploiting clustering in page translations , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[11] Sandia Report,et al. Improving Performance via Mini-applications , 2009 .
[12] Ján Veselý,et al. Large pages and lightweight memory management in virtualized environments: Can you have it both ways? , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[13] Margaret Martonosi,et al. Shared last-level TLBs for chip multiprocessors , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[14] John Goodacre. The evolution of the ARM architecture towards big data and the data-centre (abstract only) , 2013, VHPC '13.
[15] Thomas F. Wenisch,et al. Unlocking bandwidth for GPUs in CC-NUMA systems , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[16] Aamer Jaleel,et al. CoLT: Coalesced Large-Reach TLBs , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[17] Alan L. Cox,et al. Translation caching: skip, don't walk (the page table) , 2010, ISCA.
[18] Babak Falsafi,et al. Meet the walkers accelerating index traversals for in-memory databases , 2013, 2013 46th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[19] Abhishek Bhattacharjee,et al. Large-reach memory management unit caches , 2013, MICRO.
[20] Srilatha Manne,et al. Accelerating two-dimensional page walks for virtualized systems , 2008, ASPLOS.
[21] Michael M. Swift,et al. Efficient Memory Virtualization: Reducing Dimensionality of Nested Page Walks , 2014, 2014 47th Annual IEEE/ACM International Symposium on Microarchitecture.
[22] Xin Tong,et al. Prediction-based superpage-friendly TLB designs , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[23] Michael M. Swift,et al. Efficient virtual memory for big memory servers , 2013, ISCA.