A survey of techniques for architecting TLBs
暂无分享,去创建一个
[1] Osman S. Unsal,et al. Energy-efficient address translation , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[2] Gabriel H. Loh,et al. Entropy-based low power data TLB design , 2006, CASES '06.
[3] André Seznec,et al. A case for two-way skewed-associative caches , 1993, ISCA '93.
[4] Mithuna Thottethodi,et al. PreTrans: Reducing TLB CAM-search via page number prediction and speculative pre-translation , 2013, International Symposium on Low Power Electronics and Design (ISLPED).
[5] Ján Veselý,et al. Observations and opportunities in architecting shared virtual memory for heterogeneous systems , 2016, 2016 IEEE International Symposium on Performance Analysis of Systems and Software (ISPASS).
[6] Rami G. Melhem,et al. PS-TLB: Leveraging page classification information for fast, scalable and efficient translation for future CMPs , 2013, TACO.
[7] Trevor N. Mudge,et al. Uniprocessor Virtual Memory without TLBs , 2001, IEEE Trans. Computers.
[8] James R. Goodman. Coherency for multiprocessor virtual address caches , 1987, ASPLOS 1987.
[9] Anand Sivasubramaniam,et al. Characterizing the d-TLB behavior of SPEC CPU2000 benchmarks , 2002, SIGMETRICS '02.
[10] M. Frans Kaashoek,et al. Software prefetching and caching for translation lookaside buffers , 1994, OSDI '94.
[11] David A. Wood,et al. Supporting x86-64 address translation for 100s of GPU lanes , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[12] Gabriel H. Loh,et al. Increasing TLB reach by exploiting clustering in page translations , 2014, 2014 IEEE 20th International Symposium on High Performance Computer Architecture (HPCA).
[13] Per Stenström,et al. Recency-based TLB preloading , 2000, Proceedings of 27th International Symposium on Computer Architecture (IEEE Cat. No.RS00201).
[14] Jeffrey S. Vetter,et al. Opportunities for Nonvolatile Memory Systems in Extreme-Scale High-Performance Computing , 2015, Computing in Science & Engineering.
[15] Yen-Jen Chang. An Ultra Low-Power TLB Design , 2006, Proceedings of the Design Automation & Test in Europe Conference.
[16] Jaehyuk Huh,et al. Efficient synonym filtering and scalable delayed translation for hybrid virtual caching , 2016, International Symposium on Computer Architecture.
[17] Trevor N. Mudge,et al. Design Tradeoffs For Software-managed Tlbs , 1994, Proceedings of the 20th Annual International Symposium on Computer Architecture.
[18] Jeffrey S. Vetter,et al. A Survey of Software Techniques for Using Non-Volatile Memories for Storage and Main Memory Systems , 2016, IEEE Transactions on Parallel and Distributed Systems.
[19] Sparsh Mittal,et al. A Survey of Recent Prefetching Techniques for Processor Caches , 2016, ACM Comput. Surv..
[20] Renato J. O. Figueiredo,et al. On the Performance of Tagged Translation Lookaside Buffers: A Simulation-Driven Analysis , 2011, 2011 IEEE 19th Annual International Symposium on Modelling, Analysis, and Simulation of Computer and Telecommunication Systems.
[21] Yanan Wang,et al. Scattered superpage: A case for bridging the gap between superpage and page coloring , 2013, 2013 IEEE 31st International Conference on Computer Design (ICCD).
[22] Seh-Woong Jeong,et al. A Low Power TLB Structure for Embedded Systems , 2002, IEEE Computer Architecture Letters.
[23] Norman P. Jouppi,et al. A simulation based study of TLB performance , 1992, ISCA '92.
[24] G. Kandiraju,et al. Going the distance for TLB prefetching: an application-driven study , 2002, Proceedings 29th Annual International Symposium on Computer Architecture.
[25] David A. Wood,et al. An in-cache address translation mechanism , 1986, ISCA '86.
[26] Ján Veselý,et al. Large pages and lightweight memory management in virtualized environments: Can you have it both ways? , 2015, 2015 48th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO).
[27] Gurindar S. Sohi,et al. Revisiting virtual L1 caches: A practical design using dynamic synonym remapping , 2016, 2016 IEEE International Symposium on High Performance Computer Architecture (HPCA).
[28] Todd M. Austin,et al. High-Bandwidth Address Translation for Multiple-Issue Processors , 1996, ISCA.
[29] Jang-Suk Park,et al. A software-controlled prefetching mechanism for software-managed TLBs , 1995, Microprocess. Microprogramming.
[30] W. H. Wang,et al. Organization and performance of a two-level virtual-real cache hierarchy , 1989, ISCA '89.
[31] Zhen Fang,et al. Reducing cache and TLB power by exploiting memory region and privilege level semantics , 2013, J. Syst. Archit..
[32] Xin Tong,et al. Prediction-based superpage-friendly TLB designs , 2015, 2015 IEEE 21st International Symposium on High Performance Computer Architecture (HPCA).
[33] Anand Sivasubramaniam,et al. Generating physical addresses directly for saving instruction TLB energy , 2002, 35th Annual IEEE/ACM International Symposium on Microarchitecture, 2002. (MICRO-35). Proceedings..
[34] Margaret Martonosi,et al. Shared last-level TLBs for chip multiprocessors , 2011, 2011 IEEE 17th International Symposium on High Performance Computer Architecture.
[35] Randy H. Katz,et al. Eliminating the address translation bottleneck for physical address cache , 1992, ASPLOS V.
[36] Mahmut T. Kandemir,et al. Compiler-directed code restructuring for reducing data TLB energy , 2004, International Conference on Hardware/Software Codesign and System Synthesis, 2004. CODES + ISSS 2004..
[37] Aviral Shrivastava,et al. B2P2: bounds based procedure placement for instruction TLB power reduction in embedded systems , 2010, SCOPES.
[38] Leigh Stoller,et al. Increasing TLB reach using superpages backed by shadow memory , 1998, ISCA.
[39] Michael M. Swift,et al. Reducing memory reference energy with opportunistic virtual caching , 2012, 2012 39th Annual International Symposium on Computer Architecture (ISCA).
[40] Hsien-Hsin S. Lee,et al. Synonymous address compaction for energy reduction in data TLB , 2005, ISLPED '05. Proceedings of the 2005 International Symposium on Low Power Electronics and Design, 2005..
[41] Xin Tong,et al. BarTLB: Barren page resistant TLB for managed runtime languages , 2014, 2014 IEEE 32nd International Conference on Computer Design (ICCD).
[42] Abhishek Bhattacharjee,et al. Large-reach memory management unit caches , 2013, MICRO.
[43] Florence March,et al. 2016 , 2016, Affair of the Heart.
[44] Mahmut T. Kandemir,et al. Generating physical addresses directly for saving instruction TLB energy , 2002, MICRO.
[45] Albert Y. Zomaya,et al. A Survey of Mobile Device Virtualization , 2016, ACM Comput. Surv..
[46] Yiran Chen,et al. STD-TLB: A STT-RAM-based dynamically-configurable translation lookaside buffer for GPU architectures , 2014, 2014 19th Asia and South Pacific Design Automation Conference (ASP-DAC).
[47] William D. Strecker. 17 – VAX-11/780: A Virtual Address Extension to the DEC PDP-11 Family , 1978 .
[48] Aamer Jaleel,et al. CoLT: Coalesced Large-Reach TLBs , 2012, 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture.
[49] L.T. Clark,et al. A low-power 2.5-GHz 90-nm level 1 cache and memory management unit , 2005, IEEE Journal of Solid-State Circuits.
[50] William D. Strecker,et al. VAX-11/780 - A virtual address extension to the DEC PDP-11 family , 1899, AFIPS National Computer Conference.
[51] Michel Dubois,et al. The Synonym Lookaside Buffer: A Solution to the Synonym Problem in Virtual Caches , 2008, IEEE Transactions on Computers.
[52] Mark D. Hill,et al. Surpassing the TLB performance of superpages with less operating system support , 1994, ASPLOS VI.
[53] Juan E. Navarro,et al. Practical, transparent operating system support for superpages , 2002, OSDI '02.
[54] Brian N. Bershad,et al. Reducing TLB and memory overhead using online superpage promotion , 1995, Proceedings 22nd Annual International Symposium on Computer Architecture.
[55] Aviral Shrivastava,et al. Code Transformations for TLB Power Reduction , 2009, VLSI Design.
[56] David B. Whalley,et al. Designing a practical data filter cache to improve both energy efficiency and performance , 2013, ACM Trans. Archit. Code Optim..
[57] Trevor N. Mudge,et al. A look at several memory management units, TLB-refill mechanisms, and page table organizations , 1998, ASPLOS VIII.
[58] Mahmut T. Kandemir,et al. Reducing Data TLB Power via Compiler-Directed Address Generation , 2007, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems.
[59] Margaret Martonosi,et al. Characterizing the TLB Behavior of Emerging Parallel Workloads on Chip Multiprocessors , 2009, 2009 18th International Conference on Parallel Architectures and Compilation Techniques.
[60] Hsien-Hsin S. Lee,et al. Improving TLB energy for java applications on JVM , 2008, 2008 International Conference on Embedded Computer Systems: Architectures, Modeling, and Simulation.
[61] Hsien-Hsin S. Lee,et al. Energy efficient D-TLB and data cache using semantic-aware multilateral partitioning , 2003, ISLPED '03.
[62] Jang-Soo Lee,et al. A banked-promotion TLB for high performance and low power , 2001, Proceedings 2001 IEEE International Conference on Computer Design: VLSI in Computers and Processors. ICCD 2001.
[63] Tomás Lang,et al. Reducing TLB power requirements , 1997, Proceedings of 1997 International Symposium on Low Power Electronics and Design.
[64] Mahmut T. Kandemir,et al. Synergistic TLBs for High Performance Address Translation in Chip Multiprocessors , 2010, 2010 43rd Annual IEEE/ACM International Symposium on Microarchitecture.
[65] Collin McCurdy,et al. Investigating the TLB Behavior of High-end Scientific Applications on Commodity Microprocessors , 2008, ISPASS 2008 - IEEE International Symposium on Performance Analysis of Systems and software.
[66] Jeffrey S. Vetter,et al. A Survey of CPU-GPU Heterogeneous Computing Techniques , 2015, ACM Comput. Surv..
[67] Margaret Martonosi,et al. TLB Improvements for Chip Multiprocessors: Inter-Core Cooperative Prefetchers and Shared Last-Level TLBs , 2013, TACO.
[68] Peter Petrov,et al. Context-aware TLB preloading for interference reduction in embedded multi-tasked systems , 2010, GLSVLSI '10.
[69] Ching-Wen Chen,et al. Energy-efficient synonym data detection and consistency for virtual cache , 2016, Microprocess. Microsystems.
[70] Mark D. Hill,et al. Tradeoffs in supporting two page sizes , 1992, ISCA '92.
[71] Margaret Martonosi,et al. Inter-core cooperative TLB for chip multiprocessors , 2010, ASPLOS XV.
[72] Michel Cekleov,et al. Virtual-address caches. Part 1: problems and solutions in uniprocessors , 1997, IEEE Micro.
[73] Mahmut T. Kandemir,et al. Reducing dTLB energy through dynamic resizing , 2003, Proceedings 21st International Conference on Computer Design.
[74] Sparsh Mittal,et al. Exploring Design Space of 3D NVM and eDRAM Caches Using DESTINY Tool (open-source code) , 2015 .
[75] Mahmut T. Kandemir,et al. Optimizing instruction TLB energy using software and hardware techniques , 2005, TODE.
[76] Daeyeon Park,et al. Boosting superpage utilization with the shadow memory and the partial-subblock TLB , 2000, ICS '00.
[77] Stefanos Kaxiras,et al. A new perspective for efficient virtual-cache coherence , 2013, ISCA.
[78] Abhishek Bhattacharjee,et al. Architectural support for address translation on GPUs: designing memory management units for CPU/GPUs with unified address spaces , 2014, ASPLOS.
[79] Antonio Robles,et al. Efficient TLB-Based Detection of Private Pages in Chip Multiprocessors , 2016, IEEE Transactions on Parallel and Distributed Systems.
[80] Renato J. O. Figueiredo,et al. TMT - A TLB Tag Management Framework for Virtualized Platforms , 2009, SBAC-PAD.
[81] Ryan N. Rakvic,et al. A comprehensive study of hardware/software approaches to improve TLB performance for java applications on embedded systems , 2006, MSPC '06.
[82] Alan L. Cox,et al. SpecTLB: A mechanism for speculative address translation , 2011, 2011 38th Annual International Symposium on Computer Architecture (ISCA).
[83] Peter Davies,et al. The TLB slice—a low-cost high-speed address translation mechanism , 1990, ISCA '90.
[84] Rajeev Balasubramonian,et al. A Dynamically Tunable Memory Hierarchy , 2003, IEEE Trans. Computers.
[85] Osman S. Unsal,et al. Redundant Memory Mappings for fast access to large memories , 2015, 2015 ACM/IEEE 42nd Annual International Symposium on Computer Architecture (ISCA).
[86] Avi Mendelson,et al. DiDi: Mitigating the Performance Impact of TLB Shootdowns Using a Shared TLB Directory , 2011, 2011 International Conference on Parallel Architectures and Compilation Techniques.
[87] Peter Davies,et al. The TLB slice-a low-cost high-speed address translation mechanism , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.
[88] Daniel J. Sorin,et al. UNified Instruction/Translation/Data (UNITD) coherence: One protocol to rule them all , 2010, HPCA - 16 2010 The Sixteenth International Symposium on High-Performance Computer Architecture.