Design tradeoffs for software-managed TLBs

An increasing number of architectures provide virtual memory support through software-managed TLBs. However, software management can impose considerable penalties, which are highly dependent on the operating system's structure and its use of virtual memory. This work explores software-managed TLB design tradeoffs and their interaction with a range of operating systems including monolithic and microkernel designs. Through hardware monitoring and simulations, we explore TLB performance for benchmarks running on a MIPS R2000-based workstation running Ultrix, OSF/1, and three versions of mach 3.0. Results: New operating systems are changing the relative frequency of different types of TLB misses, some of which may not be efficiently handled by current architectures. For the same application binaries, total TLB service time varies by as much as an order of magnitude under different operating systems. Reducing the handling cost for kernel TLB misses reduces total TLB service time up to 40%. For TLBs between 32 and 128 slots, each doubling of the TLB size reduces total TLB service time up to 50%.

[1]  Trevor Mudge,et al.  Monster : a tool for analyzing the interaction between operating systems and computer architectures , 1992 .

[2]  J. Bradley Chen,et al.  Software methods for system address tracing , 1993, Proceedings of IEEE 4th Workshop on Workstation Operating Systems. WWOS-III.

[3]  François Armand,et al.  Data Movement in Kernelized Systems , 1992, USENIX Workshop on Microkernels and Other Kernel Architectures.

[4]  David L. Black,et al.  Machine-independent virtual memory management for paged uniprocessor and multiprocessor architectures , 1987, IEEE Trans. Computers.

[5]  Michael J. Flynn,et al.  An area model for on-chip memories and its application , 1991 .

[6]  Jerry Huck,et al.  Architectural support for translation table management in large address space machines , 1993, ISCA '93.

[7]  John K. Ousterhout,et al.  Why Aren't Operating Systems Getting Faster As Fast as Hardware? , 1990, USENIX Summer.

[8]  Milan Milenkovic Microprocessor memory management units , 1990, IEEE Micro.

[9]  John R. Mashey,et al.  Operating System Support on a RISC , 1986, COMPCON.

[10]  James R. Larus,et al.  Abstract execution: A technique for efficiently tracing programs , 1990, Softw. Pract. Exp..

[11]  Gerry Kane,et al.  MIPS RISC Architecture , 1987 .

[12]  Douglas W. Clark,et al.  Performance of the VAX-11/780 translation buffer: simulation and measurement , 1985, TOCS.

[13]  William J. Bolosky,et al.  Mach: A New Kernel Foundation for UNIX Development , 1986, USENIX Summer.

[14]  David R. Cheriton The V Kernel: A Software Base for Distributed Systems , 1984, IEEE Software.

[15]  Mahadev Satyanarayanan,et al.  Scalable, secure, and highly available distributed file access , 1990, Computer.

[16]  David L. Black,et al.  Machine-independent virtual memory management for paged uniprocessor and multiprocessor architectures , 1987, ASPLOS 1987.

[17]  Richard A. Volz,et al.  Toward real-time performance benchmarks for Ada , 1986, CACM.

[18]  R. A. Heald,et al.  A 6-ns cycle 256-kb cache memory and memory management unit , 1993 .

[19]  Brian N. Bershad,et al.  The interaction of architecture and operating system design , 1991, ASPLOS IV.

[20]  Mark D. Hill,et al.  Tradeoffs in supporting two page sizes , 1992, ISCA '92.

[21]  Faye Briggs,et al.  Translation buffer performance in a UNIX enviroment , 1985, CARN.

[22]  Brent B. Welch,et al.  The File System Belongs in the Kernel , 1991, USENIX MACH Symposium.

[23]  David A. Patterson,et al.  Computer Architecture: A Quantitative Approach , 1969 .

[24]  Ketan Mayer-Patel,et al.  Performance of a software MPEG video decoder , 1993, MULTIMEDIA '93.

[25]  John Wilkes,et al.  A comparison of Protection Lookaside Buffers and the PA-RISC protection architecture , 1992 .

[26]  Robert S. Fabry,et al.  A fast file system for UNIX , 1984, TOCS.

[27]  Norman P. Jouppi,et al.  A simulation based study of TLB performance , 1992, ISCA '92.

[28]  Mark Horowitz,et al.  Cache performance of operating system and multiprogramming workloads , 1988, TOCS.