Surpassing the TLB performance of superpages with less operating system support

Many commercial microprocessor architectures have added translation lookaside buffer (TLB) support for superpages. Superpages differ from segments because their size must be a power of two multiple of the base page size and they must be aligned in both virtual and physical address spaces. Very large superpages (e.g., 1MB) are clearly useful for mapping special structures, such as kernel data or frame buffers. This paper considers the architectural and operating system support required to exploit medium-sized superpages (e.g., 64KB, i.e., sixteen times a 4KB base page size). First, we show that superpages improve TLB performance only after invasive operating system modifications that introduce considerable overhead. We then propose two subblock TLB designs as alternate ways to improve TLB performance. Analogous to a subblock cache, a complete-subblock TLB associates a tag with a superpage-sized region but has valid bits, physical page number, attributes, etc., for each possible base page mapping. A partial-subblock TLB entry is much smaller than a complete-subblock TLB entry, because it shares physical page number and attribute fields across base page mappings. A drawback of a partial-subblock TLB is that base page mappings can share a TLB entry only if they map to consecutive physical pages and have the same attributes. We propose a physical memory allocation algorithm, page reservation, that makes this sharing more likely. When page reservation is used, experimental results show partial-subblock TLBs perform better than superpage TLBs, while requiring simpler operating system changes. If operating system changes are inappropriate, however, complete-subblock TLBs perform best.

[1]  Elliott I. Organick,et al.  The multics system: an examination of its structure , 1972 .

[2]  Sigarch The 16th Annual International Symposium on Computer Architecture , 1989 .

[3]  Trevor N. Mudge,et al.  Optimal allocation of on-chip memory for multiple-API operating systems , 1994, Proceedings of 21 International Symposium on Computer Architecture.

[4]  Trevor N. Mudge,et al.  Trap-driven simulation with Tapeworm II , 1994, ASPLOS VI.

[5]  Michael J. Flynn,et al.  An area model for on-chip memories and its application , 1991 .

[6]  James R. Larus,et al.  The Wisconsin Wind Tunnel: virtual prototyping of parallel computers , 1993, SIGMETRICS '93.

[7]  Donald Ervin Knuth,et al.  The Art of Computer Programming , 1968 .

[8]  Alan Jay Smith,et al.  Experimental evaluation of on-chip microprocessor cache memories , 1984, ISCA 1984.

[9]  James L. Peterson,et al.  Buddy systems , 1977, CACM.

[10]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[11]  Michael N. Nelson,et al.  Virtual memory support for multiple page sizes , 1993, Proceedings of IEEE 4th Workshop on Workstation Operating Systems. WWOS-III.

[12]  Anoop Gupta,et al.  SPLASH: Stanford parallel applications for shared-memory , 1992, CARN.

[13]  Trevor Mudge,et al.  Monster : a tool for analyzing the interaction between operating systems and computer architectures , 1992 .

[14]  Alan Jay Smith,et al.  Cache Memories , 1982, CSUR.

[15]  Yannick Deville,et al.  A class of replacement policies for medium and high-associativity structures , 1992, CARN.

[16]  Norman P. Jouppi,et al.  Tradeoffs in two-level on-chip caching , 1994, ISCA '94.

[17]  Milan Milenkovic Microprocessor memory management units , 1990, IEEE Micro.

[18]  Andrew W. Appel,et al.  Standard ML of New Jersey , 1991, PLILP.

[19]  John S. Liptay,et al.  Structural Aspects of the System/360 Model 85 II: The Cache , 1968, IBM Syst. J..

[20]  John H. Reppy A High-performance Garbage Collector for Standard ML , 1993 .

[21]  Divesh Srivastava,et al.  Implementation of the CORAL deductive database system , 1993, SIGMOD Conference.

[22]  Peter J. Denning Virtual Memory , 1996, ACM Comput. Surv..

[23]  Toyohiko Kagimasa,et al.  Adaptive storage management for very large virtual/real storage systems , 1991, ISCA '91.

[24]  Gerry Kane,et al.  MIPS RISC Architecture , 1987 .

[25]  Peter Davies,et al.  The TLB slice-a low-cost high-speed address translation mechanism , 1990, [1990] Proceedings. The 17th Annual International Symposium on Computer Architecture.

[26]  Jeffrey C. Mogul Big memories on the desktop , 1993, Proceedings of IEEE 4th Workshop on Workstation Operating Systems. WWOS-III.

[27]  Norman P. Jouppi,et al.  A simulation based study of TLB performance , 1992, ISCA '92.

[28]  Richard L. Sites,et al.  Alpha AXP architecture , 1993, CACM.

[29]  J. ContiC.,et al.  Structural aspects of the system/360 model 85 , 1968 .

[30]  Richard E. Kessler,et al.  Page placement algorithms for large real-indexed caches , 1992, TOCS.

[31]  Mark D. Hill,et al.  Tradeoffs in supporting two page sizes , 1992, ISCA '92.

[32]  Alan Jay Smith,et al.  Experimental evaluation of on-chip microprocessor cache memories , 1984, ISCA '84.

[33]  R. S. Fabry,et al.  A fast file system for UNIX , 1984, TOCS.

[34]  David H. Bailey,et al.  The Nas Parallel Benchmarks , 1991, Int. J. High Perform. Comput. Appl..