Devirtualizing Memory in Heterogeneous Systems
暂无分享,去创建一个
[1] Srilatha Manne,et al. Accelerating two-dimensional page walks for virtualized systems , 2008, ASPLOS.
[2] Kiyoung Choi,et al. A scalable processing-in-memory accelerator for parallel graph processing , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[3] Paul C. Kocher,et al. Timing Attacks on Implementations of Diffie-Hellman, RSA, DSS, and Other Systems , 1996, CRYPTO.
[4] David A. Wood,et al. Border control: Sandboxing accelerators , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[5] David H. Bailey,et al. The NAS parallel benchmarks summary and preliminary results , 1991, Proceedings of the 1991 ACM/IEEE Conference on Supercomputing (Supercomputing '91).
[6] Abhishek Bhattacharjee,et al. Large-reach memory management unit caches , 2013, MICRO.
[7] Somayeh Sardashti,et al. The gem5 simulator , 2011, CARN.
[8] Margaret Martonosi,et al. Inter-core cooperative TLB for chip multiprocessors , 2010, ASPLOS XV.
[9] Margaret Martonosi,et al. Shared last-level TLBs for chip multiprocessors , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[10] Parthasarathy Ranganathan,et al. From Microprocessors to Nanostores: Rethinking Data-Centric Systems , 2011, Computer.
[11] Brian W. Barrett,et al. Introducing the Graph 500 , 2010 .
[12] James Bennett,et al. The Netflix Prize , 2007 .
[13] John L. Henning. SPEC CPU2006 benchmark descriptions , 2006, CARN.
[14] Hovav Shacham,et al. On the effectiveness of address-space randomization , 2004, CCS '04.
[15] Norman P. Jouppi,et al. Optimizing NUCA Organizations and Wiring Alternatives for Large Caches with CACTI 6.0 , 2007, 40th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO 2007).
[16] Margaret Martonosi,et al. Graphicionado: A high-performance and energy-efficient accelerator for graph analytics , 2016, 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[17] Osman S. Unsal,et al. Redundant Memory Mappings for fast access to large memories , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[18] Aamer Jaleel,et al. CoLT: Coalesced Large-Reach TLBs , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[19] Karthikeyan Sankaralingam,et al. Stream-dataflow acceleration , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[20] Osman S. Unsal,et al. Energy-efficient address translation , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[21] Herbert Bos,et al. ASLR on the Line: Practical Cache Attacks on the MMU , 2017, NDSS.
[22] Pradeep Dubey,et al. Navigating the maze of graph analytics frameworks using massive graph datasets , 2014, SIGMOD Conference.
[23] Christos Faloutsos,et al. R-MAT: A Recursive Model for Graph Mining , 2004, SDM.
[24] Andrew Siegel,et al. XSBENCH - THE DEVELOPMENT AND VERIFICATION OF A PERFORMANCE ABSTRACTION FOR MONTE CARLO REACTOR ANALYSIS , 2014 .
[25] David A. Wood,et al. Supporting x86-64 address translation for 100s of GPU lanes , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[26] Christian Bienia,et al. Benchmarking modern multiprocessors , 2011 .
[27] Luca Benini,et al. Lightweight virtual memory support for many-core accelerators in heterogeneous embedded SoCs , 2015, 2015 International Conference on Hardware/Software Codesign and System Synthesis (CODES+ISSS).
[28] Jason Cong,et al. Supporting Address Translation for Accelerator-Centric Architectures , 2017, 2017 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[29] Parag Agrawal,et al. The case for RAMClouds: scalable high-performance storage entirely in DRAM , 2010, OPSR.
[30] Alan L. Cox,et al. Translation caching: skip, don't walk (the page table) , 2010, ISCA.
[31] Krste Asanovic,et al. Mondrian memory protection , 2002, ASPLOS X.
[32] Jeffrey S. Chase,et al. Architecture support for single address space operating systems , 1992, ASPLOS V.
[33] Song Han,et al. EIE: Efficient Inference Engine on Compressed Deep Neural Network , 2016, 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA).
[34] Herbert Bos,et al. Undermining Information Hiding (and What to Do about It) , 2016, USENIX Security Symposium.
[35] David A. Patterson,et al. In-datacenter performance analysis of a tensor processing unit , 2017, 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA).
[36] Hovav Shacham,et al. Return-Oriented Programming: Systems, Languages, and Applications , 2012, TSEC.
[37] Joel Emer,et al. Eyeriss: a spatial architecture for energy-efficient dataflow for convolutional neural networks , 2016, CARN.
[38] Abhishek Bhattacharjee,et al. Efficient Address Translation for Architectures with Multiple Page Sizes , 2017, ASPLOS.
[39] Timothy A. Davis,et al. The university of Florida sparse matrix collection , 2011, TOMS.
[40] Kenneth A. Ross,et al. Q100: the architecture and design of a database processing unit , 2014, ASPLOS.
[41] Michael M. Swift,et al. Efficient virtual memory for big memory servers , 2013, ISCA.
[42] Simha Sethumadhavan,et al. Security Implications of Third-Party Accelerators , 2016, IEEE Computer Architecture Letters.
[43] Mark D. Hill,et al. Tradeoffs in supporting two page sizes , 1992, ISCA '92.
[44] Manos Athanassoulis,et al. Beyond the Wall: Near-Data Processing for Databases , 2015, DaMoN.
[45] Michael M. Swift,et al. BadgerTrap: a tool to instrument x86-64 TLB misses , 2014, CARN.
[46] Abhishek Bhattacharjee,et al. Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces , 2014, ASPLOS.