Diligent TLBs: a mechanism for exploiting heterogeneity in TLB miss behavior
暂无分享,去创建一个
Gregory T. Byrd | Amro Awad | Hussein Elnawawy | Rangeen Basu Roy Chowdhury | Amro Awad | G. Byrd | Hussein Elnawawy
[1] Margaret Martonosi,et al. Inter-Core Cooperative TLB Prefetchers for Chip Multiprocessors , 2010, ASPLOS 2010.
[2] Norman P. Jouppi,et al. A simulation based study of TLB performance , 1992, ISCA '92.
[3] David Roberts,et al. Heterogeneous memory architectures: A HW/SW approach for mixing die-stacked and off-package memories , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[4] Anand Sivasubramaniam,et al. Going the distance for TLB prefetching: an application-driven study , 2002, ISCA.
[5] Simon D. Hammond,et al. The Potential and Perils of Multi-Level Memory , 2015, MEMSYS.
[6] Ching-Yung Lin,et al. A Highly Efficient Runtime and Graph Library for Large Scale Graph Analytics , 2014, GRADES.
[7] Per Stenström,et al. Recency-based TLB preloading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[8] David Roberts,et al. Toward Efficient Programmer-Managed Two-Level Memory Hierarchies in Exascale Computers , 2014, 2014 Hardware-Software Co-Design for High Performance Computing.
[9] Ján Veselý,et al. Large pages and lightweight memory management in virtualized environments: Can you have it both ways? , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[10] Jeff R. Hammond,et al. User Extensible Heap Manager for Heterogeneous Memory Platforms and Mixed Memory Policies , 2015 .
[11] Margaret Martonosi,et al. Shared last-level TLBs for chip multiprocessors , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[12] Harish Patil,et al. Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.
[13] Keith D. Underwood,et al. The Structural Simulation Toolkit: A Tool for Bridging the Ar chitectural/Microarchitectural Evaluation Gap , 2004 .
[14] Ching-Yung Lin,et al. GraphBIG: understanding graph computing in the context of industrial solutions , 2015, SC15: International Conference for High Performance Computing, Networking, Storage and Analysis.
[15] Norman P. Jouppi,et al. Cacti 3. 0: an integrated cache timing, power, and area model , 2001 .
[16] Margaret Martonosi,et al. Characterizing the TLB Behavior of Emerging Parallel Workloads on Chip Multiprocessors , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[17] Michael M. Swift,et al. Efficient virtual memory for big memory servers , 2013, ISCA.
[18] Christian Bienia,et al. Benchmarking modern multiprocessors , 2011 .
[19] Amro Awad,et al. Samba: A Detailed Memory Management Unit (MMU) for the SST Simulation Framework , 2016 .
[20] Andrew Siegel,et al. XSBENCH - THE DEVELOPMENT AND VERIFICATION OF A PERFORMANCE ABSTRACTION FOR MONTE CARLO REACTOR ANALYSIS , 2014 .
[21] Simon David Hammond,et al. memkind: An Extensible Heap Memory Manager for Heterogeneous Memory Platforms and Mixed Memory Policies. , 2015 .
[22] Aamer Jaleel,et al. High performance cache replacement using re-reference interval prediction (RRIP) , 2010, ISCA.
[23] Vivien Quéma,et al. Large Pages May Be Harmful on NUMA Systems , 2014, USENIX Annual Technical Conference.