Recency-based TLB preloading

Caching and other latency tolerating techniques have been quite successful in maintaining high memory system performance for general purpose processors. However, translation lookaside buffers (TLB) misses have become a serious bottleneck as working sets are growing beyond the capacity of TLBs. This paper presents one of the first attempts to hide TLB miss latency by using preloading techniques. We present results for traditional next-page TLB miss preloading-an approach shown to cut some of the misses. However, a key contribution of this work is a novel TLB miss prediction algorithm based on the concept of "recency", and we show that it can predict over 55% of the TLB misses for the five commercial applications considered.

[1]  Irving L. Traiger,et al.  Evaluation Techniques for Storage Hierarchies , 1970, IBM Syst. J..

[2]  Douglas W. Clark,et al.  Performance of the VAX-11/780 translation buffer: simulation and measurement , 1985, TOCS.

[3]  Brian N. Bershad,et al.  Consistency management for virtually indexed caches , 1992, ASPLOS V.

[4]  Mark D. Hill,et al.  Tradeoffs in supporting two page sizes , 1992, ISCA '92.

[5]  Norman P. Jouppi,et al.  A simulation based study of TLB performance , 1992, ISCA '92.

[6]  André Seznec,et al.  A case for two-way skewed-associative caches , 1993, ISCA '93.

[7]  André Seznec,et al.  Acase For Two-way Skewed-associative Caches , 1993, Proceedings of the 20th Annual International Symposium on Computer Architecture.

[8]  Jerry Huck,et al.  Architectural support for translation table management in large address space machines , 1993, ISCA '93.

[9]  M. Frans Kaashoek,et al.  Software prefetching and caching for translation lookaside buffers , 1994, OSDI '94.

[10]  Mark D. Hill,et al.  Surpassing the TLB performance of superpages with less operating system support , 1994, ASPLOS VI.

[11]  Jang-Suk Park,et al.  A software-controlled prefetching mechanism for software-managed TLBs , 1995, Microprocess. Microprogramming.

[12]  Per Stenström,et al.  Evaluation of Hardware-Based Stride and Sequential Prefetching in Shared-Memory Multiprocessors , 1996, IEEE Trans. Parallel Distributed Syst..

[13]  Hsiao-Keng Jerry Chu,et al.  Zero-Copy TCP in Solaris , 1996, USENIX Annual Technical Conference.

[14]  Todd M. Austin,et al.  High-Bandwidth Address Translation for Multiple-Issue Processors , 1996, ISCA.

[15]  Michel Cekleov,et al.  Virtual-address caches. Part 1: problems and solutions in uniprocessors , 1997, IEEE Micro.

[16]  Trevor N. Mudge,et al.  Software-managed address translation , 1997, Proceedings Third International Symposium on High-Performance Computer Architecture.

[17]  Trevor N. Mudge,et al.  A look at several memory management units, TLB-refill mechanisms, and page table organizations , 1998, ASPLOS VIII.

[18]  Michel Dubois,et al.  Options for dynamic address translation in COMAs , 1998, ISCA.

[19]  M. Dubois,et al.  Tolerating late memory traps in ILP processors , 1999, Proceedings of the 26th International Symposium on Computer Architecture (Cat. No.99CB36367).