DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory
暂无分享,去创建一个
Avi Mendelson | Osman S. Unsal | Adrián Cristal | Vasileios Karakostas | Nacho Navarro | Alex Ramírez | Yoav Etsion | Lluís Vilanova | Carlos Villavieja | A. Mendelson | N. Navarro | A. Ramírez | L. Vilanova | Yoav Etsion | O. Unsal | A. Cristal | C. Villavieja | Vasileios Karakostas | Alex Ramírez | Carlos Villavieja
[1] Bryan S. Rosenburg. Low-synchronization translation lookaside buffer consistency in large-scale shared-memory multiprocessors , 1989, SOSP '89.
[2] Margaret Martonosi,et al. Shared last-level TLBs for chip multiprocessors , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[3] Dan Tsafrir,et al. Effects of clock resolution on the scheduling of interactive and soft real-time processes , 2003, SIGMETRICS '03.
[4] Adrian Schüpbach,et al. The multikernel: a new OS architecture for scalable multicore systems , 2009, SOSP '09.
[5] Margaret Martonosi,et al. Inter-core cooperative TLB for chip multiprocessors , 2010, ASPLOS XV.
[6] David L. Black,et al. Machine-independent virtual memory management for paged uniprocessor and multiprocessor architectures , 1987, IEEE Trans. Computers.
[7] Michael Lang,et al. A Performance Evaluation of the Nehalem Quad-Core Processor for Scientific Computing , 2008, Parallel Process. Lett..
[8] Milo M. K. Martin,et al. Subtleties of transactional memory atomicity semantics , 2006, IEEE Computer Architecture Letters.
[9] A. Ramírez,et al. Scalable Simulation of Decoupled Accelerator Architectures , 2010 .
[10] Mahmut T. Kandemir,et al. Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[11] M. Desnoyers,et al. Combined Tracing of the Kernel and Applications with LTTng , 2010 .
[12] Martín Abadi,et al. Transactional memory with strong atomicity using off-the-shelf memory protection hardware , 2009, PPoPP '09.
[13] Norman P. Jouppi,et al. Architecting Efficient Interconnects for Large Caches with CACTI 6.0 , 2008, IEEE Micro.
[14] Daniel J. Sorin,et al. UNified Instruction/Translation/Data (UNITD) coherence: One protocol to rule them all , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.
[15] H. Peter Hofstee,et al. Power efficient processor architecture and the cell processor , 2005, 11th International Symposium on High-Performance Computer Architecture.
[16] M. Snir,et al. TLB consistency on highly-parallel shared-memory multiprocessors , 2018, [1988] Proceedings of the Twenty-First Annual Hawaii International Conference on System Sciences. Volume I: Architecture Track.
[17] Christoforos E. Kozyrakis,et al. Evaluating MapReduce for Multi-core and Multiprocessor Systems , 2007, 2007 IEEE 13th International Symposium on High Performance Computer Architecture.
[18] Steven S. Lumetta,et al. CUBA: an architecture for efficient CPU/co-processor data communication , 2008, ICS '08.
[19] Nir Shavit,et al. Transactional Locking II , 2006, DISC.
[20] David H. Bailey,et al. NAS parallel benchmark results , 1992, Proceedings Supercomputing '92.
[21] David L. Black,et al. Translation lookaside buffer consistency: a software approach , 1989, ASPLOS III.
[22] Margaret Martonosi,et al. Characterizing the TLB Behavior of Emerging Parallel Workloads on Chip Multiprocessors , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[23] Kunle Olukotun,et al. STAMP: Stanford Transactional Applications for Multi-Processing , 2008, 2008 IEEE International Symposium on Workload Characterization.
[24] Kai Li,et al. The PARSEC benchmark suite: Characterization and architectural implications , 2008, 2008 International Conference on Parallel Architectures and Compilation Techniques (PACT).
[25] Trevor N. Mudge,et al. A look at several memory management units, TLB-refill mechanisms, and page table organizations , 1998, ASPLOS VIII.